Top Banner
Research Article Hidden Semi-Markov Models for Predictive Maintenance Francesco Cartella, 1 Jan Lemeire, 1 Luca Dimiccoli, 1 and Hichem Sahli 1,2 1 Electronics and Informatics Department (ETRO), Vrije Universiteit Brussel (VUB), Plainlaan 2, 1050 Brussels, Belgium 2 Interuniversity Microelectronics Center (IMEC), Kapeldreef 75, 3001 Leuven, Belgium Correspondence should be addressed to Francesco Cartella; [email protected] Received 9 October 2014; Accepted 28 December 2014 Academic Editor: Hang Xu Copyright © 2015 Francesco Cartella et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Realistic predictive maintenance approaches are essential for condition monitoring and predictive maintenance of industrial machines. In this work, we propose Hidden Semi-Markov Models (HSMMs) with (i) no constraints on the state duration density function and (ii) being applied to continuous or discrete observation. To deal with such a type of HSMM, we also propose modifications to the learning, inference, and prediction algorithms. Finally, automatic model selection has been made possible using the Akaike Information Criterion. is paper describes the theoretical formalization of the model as well as several experiments performed on simulated and real data with the aim of methodology validation. In all performed experiments, the model is able to correctly estimate the current state and to effectively predict the time to a predefined event with a low overall average absolute error. As a consequence, its applicability to real world settings can be beneficial, especially where in real time the Remaining Useful Lifetime (RUL) of the machine is calculated. 1. Introduction Predictive models that are able to estimate the current condition and the Remaining Useful Lifetime of an industrial equipment are of high interest, especially for manufacturing companies, which can optimize their maintenance strategies. If we consider that the costs derived from maintenance are one of the largest parts of the operational costs [1] and that oſten the maintenance and operations departments comprise about 30% of the manpower [2, 3], it is not difficult to estimate the economic advantages that such innovative techniques can bring to industry. Moreover, predictive maintenance, where in real time the Remaining Useful Lifetime (RUL) of the machine is calculated, has been proven to significantly outperforms other maintenance strategies, such as corrective maintenance [4]. In this work, RUL is defined as the time, from the current moment, that the systems will fail [5]. Failure, in this context, is defined as a deviation of the delivered output of a machine from the specified service requirements [6] that necessitate maintenance. Models like Support Vector Machines [7], Dynamic Bayesian Networks [8], clustering techniques [9], and data mining approaches [10] have been successfully applied to condition monitoring, RUL estimation, and predictive main- tenance problems [11, 12]. State space models, like Hidden Markov Models (HMMs) [13], are particularly suitable to be used in industrial applications, due to their ability to model the latent state which represents the health condition of the machine. Classical HMMs have been applied to condition assess- ment [14, 15]; however, their usage in predictive maintenance has not been effective due to their intrinsic modeling of the state duration as a geometric distribution. To overcome this drawback, a modified version of HMM, which takes into account an estimate of the duration in each state, has been proposed in the works of Tobon-Mejia et al. [1619]. anks to the explicit state sojourn time modeling, it has been shown that it is possible to effectively estimate the RUL for industrial equipment. However, the drawback of their proposed HMM model is that the state duration is always assumed as Gaussian distributed and the duration parameters are estimated empirically from the Viterbi path of the HMM. A complete specification of a duration model together with a set of learning and inference algorithms has been given firstly by Ferguson [20]. In his work, Ferguson allowed Hindawi Publishing Corporation Mathematical Problems in Engineering Volume 2015, Article ID 278120, 23 pages http://dx.doi.org/10.1155/2015/278120
24

Hidden Semi-Markov Models for Predictive Maintenance

Mar 28, 2023

Download

Documents

Cosmin Lazar
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Hidden Semi-Markov Models for Predictive Maintenance

Research ArticleHidden Semi-Markov Models for Predictive Maintenance

Francesco Cartella1 Jan Lemeire1 Luca Dimiccoli1 and Hichem Sahli12

1Electronics and Informatics Department (ETRO) Vrije Universiteit Brussel (VUB) Plainlaan 2 1050 Brussels Belgium2Interuniversity Microelectronics Center (IMEC) Kapeldreef 75 3001 Leuven Belgium

Correspondence should be addressed to Francesco Cartella fcartelletrovubacbe

Received 9 October 2014 Accepted 28 December 2014

Academic Editor Hang Xu

Copyright copy 2015 Francesco Cartella et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

Realistic predictive maintenance approaches are essential for condition monitoring and predictive maintenance of industrialmachines In this work we propose Hidden Semi-Markov Models (HSMMs) with (i) no constraints on the state duration densityfunction and (ii) being applied to continuous or discrete observation To deal with such a type of HSMM we also proposemodifications to the learning inference and prediction algorithms Finally automaticmodel selection has beenmade possible usingthe Akaike Information Criterion This paper describes the theoretical formalization of the model as well as several experimentsperformed on simulated and real data with the aim of methodology validation In all performed experiments the model is ableto correctly estimate the current state and to effectively predict the time to a predefined event with a low overall average absoluteerror As a consequence its applicability to real world settings can be beneficial especially where in real time the Remaining UsefulLifetime (RUL) of the machine is calculated

1 Introduction

Predictive models that are able to estimate the currentcondition and the Remaining Useful Lifetime of an industrialequipment are of high interest especially for manufacturingcompanies which can optimize their maintenance strategiesIf we consider that the costs derived from maintenance areone of the largest parts of the operational costs [1] and thatoften the maintenance and operations departments compriseabout 30of themanpower [2 3] it is not difficult to estimatethe economic advantages that such innovative techniquescan bring to industry Moreover predictive maintenancewhere in real time the Remaining Useful Lifetime (RUL) ofthe machine is calculated has been proven to significantlyoutperforms other maintenance strategies such as correctivemaintenance [4] In this work RUL is defined as the timefrom the current moment that the systems will fail [5]Failure in this context is defined as a deviation of thedelivered output of a machine from the specified servicerequirements [6] that necessitate maintenance

Models like Support Vector Machines [7] DynamicBayesian Networks [8] clustering techniques [9] and datamining approaches [10] have been successfully applied to

condition monitoring RUL estimation and predictive main-tenance problems [11 12] State space models like HiddenMarkov Models (HMMs) [13] are particularly suitable to beused in industrial applications due to their ability to modelthe latent state which represents the health condition of themachine

Classical HMMs have been applied to condition assess-ment [14 15] however their usage in predictive maintenancehas not been effective due to their intrinsic modeling of thestate duration as a geometric distribution

To overcome this drawback a modified version of HMMwhich takes into account an estimate of the duration in eachstate has been proposed in the works of Tobon-Mejia et al[16ndash19] Thanks to the explicit state sojourn time modelingit has been shown that it is possible to effectively estimatethe RUL for industrial equipment However the drawbackof their proposed HMM model is that the state duration isalways assumed as Gaussian distributed and the durationparameters are estimated empirically from the Viterbi pathof the HMM

A complete specification of a duration model togetherwith a set of learning and inference algorithms has beengiven firstly by Ferguson [20] In his work Ferguson allowed

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2015 Article ID 278120 23 pageshttpdxdoiorg1011552015278120

2 Mathematical Problems in Engineering

the underlying stochastic process of the state to be a semi-Markov chain instead of a simple Markov chain of a HMMSuch model is referred to as Hidden Semi-Markov Model(HSMM) [21] HSMMs and explicit duration modeles havebeen proven beneficial for many applications [22ndash25] Acomplete overview of different duration model classes hasbeenmade byYu [26]Most state durationmodels used in theliterature are nonparametric discrete distributions [27ndash29]As a consequence the number of parameters that describe themodel and that have to be estimated is high and consequentlythe learning procedure can be computationally expensive forreal complex applications Moreover it is necessary to specifya priori the maximum duration allowed in each state

To alleviate the high dimensionality of the parameterspace parametric duration models have been proposed Forexample Salfner [6] proposed a generic parametric continu-ous distribution to model the state sojourn time However intheir model the observation has been assumed to be discreteand applied to recognize failure-prone observation sequenceUsing continuous observation Azimi et al [30ndash32] specifiedan HSMM with parametric duration distribution belongingto the Gamma family and modeled the observation processby a Gaussian

Inspired by the latter two approaches in this workwe pro-pose a generic specification of a parametric HSMM in whichno constraints are made on the model of the state durationand on the observation processes In our approach the stateduration ismodeled as a generic parametric density functionOn the other hand the observations can be modeled eitheras a discrete stochastic process or as continuous mixtureof Gaussians The latter has been shown to approximatearbitrarily closely any finite continuous density function[33] The proposed model can be generally used in a widerange of applications and types of data Moreover in thispaper we introduce a new and more effective estimator ofthe time spent by the system in a determinate state priorto the current time To the best of our knowledge a partfrom the above referred works the literature on HSMMsapplied to prognosis and predictive maintenance for indus-trial machines is limited [34] Hence the present work aimsto show the effectiveness of the proposed duration model insolving condition monitoring and RUL estimation problems

Dealing with state space models and in particular ofHSMMs one should define the number of states and cor-rect family of duration density and in case of continuousobservations the adequate number of Gaussian mixturesSuch parameters play a prominent role since the right modelconfiguration is essential to enable an accurate modelingof the dynamic pattern and the covariance structure of theobserved time series The estimation of a satisfactory modelconfiguration is referred to asmodel selection in literature

While several state-of-the-art approaches use expertknowledge to get insight on the model structure [15 3536] an automated methodology for model selection is oftenrequired In the literature model selection has been deeplystudied for a wide range of models Among the existingmethodologies information based techniques have beenextensively analyzed in literature with satisfactory results

Although Bayesian Information Criterion (BIC) is particu-larly appropriate to be used in finite mixture models [37 38]Akaike Information Criterion (AIC) has been demonstratedto outperform BIC when applied to more complex modelsand when the sample size is limited [39 40] which is the caseof the target application of this paper

In this work AIC is used to estimate the correct modelconfiguration with the final goal of an automated HSMMsmodel selection which exploits only the information avail-able in the input dataWhile model selection techniques havebeen extensively used in the framework of Hidden MarkovModels [41ndash43] to the best of our knowledge the presentwork is the first that proposes their appliance to durationmodels and in particular to HSMMs

In summary the present work contributes to conditionmonitoring predictive maintenance and RUL estimationproblems by

(i) proposing a general Hidden Semi-Markov Modelapplicable for continuous or discrete observations andwith no constraints on the density function used tomodel the state duration

(ii) proposing a more effective estimator of the stateduration variable 119889

119905(119894) that is the time spent by the

system in the 119894th state prior to current time 119905(iii) adapting the learning inference and prediction algo-

rithms considering the defined HSMM parametersand the proposed 119889

119905(119894) estimator

(iv) using the Akaike Information Criterion for automaticmodel selection

The rest of the paper is organized as follows in Section 2we introduce the theory of the proposed HSMM togetherwith its learning inference and prediction algorithmsSection 3 gives a short theoretical overview of the AkaikeInformation Criterion Section 4 presents the methodologyused to estimate the Remaining Useful Lifetime using theproposed HSMM In Section 5 experimental results arediscussed The conclusion and future research directions aregiven in Section 6

2 Hidden Semi-Markov Models

Hidden Semi-Markov Models (HSMMs) introduce the con-cept of variable duration which results in a more accuratemodeling power if the system being modeled shows a depen-dence on time

In this section we give the specification of the proposedHSMM for which we model the state duration with a para-metric state-dependent distribution Compared to nonpara-metric modeling this approach has two main advantages

(i) the model is specified by a limited number of param-eters as a consequence the learning procedure iscomputationally less expensive

(ii) the model does not require the a priori knowledgeof the maximum sojourn time allowed in each statebeing inherently learnt through the duration distribu-tion parameters

Mathematical Problems in Engineering 3

21 Model Specification A Hidden Semi-Markov Model isa doubly embedded stochastic model with an underlyingstochastic process that is not observable (hidden) but canonly be observed through another set of stochastic processesthat produce the sequence of observations HSMMallows theunderlying process to be a semi-Markov chain with a variableduration or sojourn time for each state The key conceptof HSMMs is that the semi-Markov property holds for thismodel while in HMMs the Markov property implies that thevalue of the hidden state at time 119905 depends exclusively on itsvalue of time 119905 minus 1 in HSMMs the probability of transitionfrom state 119878

119895to state 119878

119894at time 119905depends on the duration spent

in state 119878119895prior to time 119905

In the following we denote the number of states in themodel as119873 the individual states as 119878 = 119878

1 119878

119873 and the

state at time 119905 as 119904119905 The semi-Markov property can be written

as

P (119904119905+1

= 119878119894| 119904

119905= 119878

119895 119904

1= 119878

119896)

= P (119904119905+1

= 119894 | 119904119905= 119895 119889

119905(119895)) 1 le 119894 119895 119896 le 119873

(1)

where the duration variable 119889119905(119895) is defined as the time spent

in state 119878119895prior to time 119905

Although the state duration is inherently discrete inmany studies [44 45] it has been modeled with a continuousparametric density function Similar to the work of Azimiet al [30ndash32] in this paper we use the discrete counterpartof the chosen parametric probability density function (pdf)With this approximation if we denote the pdf of the sojourntime in state 119878

119894as 119891(119909 120579

119894) where 120579

119894represents the set of

parameters of the pdf relative to the 119894th state the probabilitythat the system stays in state 119878

119894for exactly 119889 time steps

can be calculated as int119889

119889minus1119891(119909 120579

119894)119889119909 Considering the HSMM

formulation we can generally denote the state dependentduration distributions by the set of their parameters relativeto each state as Θ = 120579

1 120579

119873

Many related works on HSMMs [31 32 44 45] consider119891(119909 120579

119894) within the exponential family In particular Gamma

distributions are oftenused in speech processing applicationsIn this work we do not impose a type of distribution functionto model the duration The only requirement is that theduration should be modeled as a positive function beingnegative durations physically meaningless

HSMMs require also the definition of a ldquodynamicrdquo tran-sitionmatrix as a consequence of the semi-Markov propertyDifferently from the HMMs in which a constant transitionprobability leads to a geometric distributed state sojourntime HSMMs explicitly define a transition matrix whichdepending on the duration variable has increasing probabil-ities of changing state as the time goes on For conveniencewe specify the state duration variable in a form of a vector d

119905

with dimensions119873 times 1 as

d119905=

119889119905(119895) if 119904

119905= 119878

119895

1 if 119904119905

= 119878119895

(2)

The quantity 119889119905(119895) can be easily calculated by induction from

119889119905minus1

(119895) as

119889119905(119895) = 119904

119905(119895) sdot 119904

119905minus1(119895) sdot 119889

119905minus1(119895) + 1 (3)

where 119904119905(119895) is 1 if 119904

119905= 119878

119895 0 otherwise

If we assume that at time 119905 the system is in state 119878119894 we can

formally define the duration-dependent transition matrix asAd119905

= [119886119894119895(d

119905)] with

119886119894119895(d

119905) = P (119904

119905+1= 119878

119895| 119904

119905= 119878

119894 119889

119905(119894)) 1 le 119894 119895 le 119873 (4)

The specification of themodel can be further simplified byobserving that at each time 119905 the matrix Ad

119905

can be decom-posed in two terms the recurrent and the nonrecurrent statetransition probabilities

The recurrent transition probabilities P(d119905) = [119901

119894119895(d

119905)]

which depend only on the duration vector d119905and the

parameters Θ take into account the dynamics of the self-transition probabilities It is defined as the probability ofremaining in the current state at the next time step given theduration spent in the current state prior to time 119905

119901119894119894(d

119905) = P (119904

119905+1= 119878

119894| 119904

119905= 119878

119894 119889

119905(119894))

= P (119904119905+1

= 119878119894|

119904119905= 119878

119894 119904

119905minus1= 119878

119894 119904

119905minus119889119905(119894)+1

= 119878119894 119904

119905minus119889119905(119894)

= 119878119894)

= (P (119904119905+1

= 119878119894 119904

119905= 119878

119894 119904

119905minus119889119905(119894)+2

= 119878119894|

119904119905minus119889119905(119894)+1

= 119878119894 119904

119905minus119889119905(119894)

= 119878119894))

sdot (P (119904119905= 119878

119894 119904

119905minus1= 119878

119894 119904

119905minus119889119905(119894)+2

= 119878119894|

119904119905minus119889119905(119894)+1

= 119878119894 119904

119905minus119889119905(119894)

= 119878119894))

minus1

(5)

The denominator in (5) can be expressed as suminfin

119896=1P(119904

119905+119896=

119878119894 119904

119905+119896minus1= 119878

119894 119904

119905minus119889119905(119894)+2

= 119878119894| 119904

119905minus119889119905(119894)+1

= 119878119894 119904

119905minus119889119905(119894)

= 119878119894)

which is the probability that the system at time 119905 has beenstaying in state 119878

119894for at least 119889

119905(119894) minus 1 time units The above

expression is equivalent to 1 minus 119865(119889119905(119894) minus 1 120579

119894) where 119865(sdot 120579

119894)

is the duration cumulative distribution function relative tothe the state 119878

119894 that is 119865(119889 120579) = int

119889

minusinfin119891(119909 120579)119889119909 As a

consequence from (5) we can define the recurrent transitionprobabilities as a diagonal matrix with dimensions119873times119873 as

P (d119905) = [119901

119894119895(d

119905)] =

1 minus 119865 (119889119905(119894) 120579

119894)

1 minus 119865 (119889119905(119894) minus 1 120579

119894)

if 119894 = 119895

0 if 119894 = 119895

(6)

The usage of the cumulative functions in (6) which tendto 1 as the duration tends to infinity suggests that theprobability of self-transition tends to decrease as the sojourntime increases leading the model to always leave the currentstate if time approaches infinity

The nonrecurrent state transition probabilities A0=

[1198860

119894119895] rule the transitions between two different states It is

4 Mathematical Problems in Engineering

represented by a 119873 times 119873 matrix with the diagonal elementsequal to zero defined as

A0= [119886

0

119894119895] =

0 if 119894 = 119895

P (119904119905+1

= 119878119895| 119904

119905= 119878

119894) if 119894 = 119895

(7)

A0 must be specified as a stochastic matrix that is its ele-ments have to satisfy the constraintsum119873

119895=11198860

119894119895= 1 for all 119894

As a consequence of the above decomposition thedynamic of the underlying semi-Markov chain can be definedby specifying only the state-dependent duration parametersΘ and the nonrecurrentmatrixA0 since themodel transitionmatrix can be calculated at each time 119905 using (6) and (7)

Ad119905

= P (d119905) + (I minus P (d

119905))A0

(8)

where I is the identity matrix If we denote the elements ofthe dynamic transition matrix Ad

119905

as 119886119894119895(d

119905) the stochastic

constraint sum119873

119895=1119886119894119895(d

119905) = 1 for all 119894 and 119905 is guaranteed from

the fact that P(d119905) is a diagonal matrix and A0 is a stochastic

matrixFor several applications it is necessary to model the

absorbing state which in the case of industrial equipmentcorresponds to the ldquobrokenrdquo or ldquofailurerdquo state If we denote theabsorbing state as 119878

119896with 119896 isin [1119873] we must fix the 119896th row

of the nonrecurrentmatrixA0 to be 1198860119896119896= 1 and 1198860

119896119894= 0 for all

1 le 119894 le 119873 with 119894 = 119896 By substituting such A0 matrix in (8)it is easy to show that the element 119886

119896119896(d

119905) = 1 and remains

constant for all 119905 while the duration probability parameters120579119896are not influent for the absorbing state 119878

119896 An example of

absorbing state specification will be given in Section 5With respect to the input observation signals in this work

we consider both continuous and discrete data by adaptingthe suitable observationmodel depending on the observationnature In particular for the continuous case we modelthe observations with a multivariate mixture of Gaussiansdistributions This choice presents two main advantages (i)a multivariate model allows to deal with multiple observa-tions at the same time this is often the case of industrialequipments modeling since at each time multiple sensorsrsquomeasurements are available and (ii) mixture of Gaussians hasbeen proved to closely approximate any finite and continuousdensity function [33] Formally if we denote by x

119905the

observation vector at time 119905 and the generic observationvector beingmodeled as x the observation density for the 119895thstate is represented by a finite mixture of119872 gaussians

119887119895(x) =

119872

sum

119898=1

119888119895119898N (x120583

119895119898U

119895119898) 1 le 119895 le 119873 (9)

where 119888119895119898

is the mixture coefficient for the 119898th mixture instate 119878

119895 which satisfies the stochastic constraintsum119872

119898=1119888119895119898

= 1

for 1 le 119895 le 119873 and 119888119895119898

ge 0 for 1 le 119895 le 119873 and 1 le 119898 le 119872while N is the Gaussian density with mean vector 120583

119895119898and

covariance matrix U119895119898

for the 119898th mixture component instate 119895

In case of discrete data we model the observationswithin each state with a nonparametric discrete probability

distribution In particular if 119871 is the number of distinctobservation symbols per state and if we denote the symbolsas 119883 = 119883

1 119883

119871 and the observation at time 119905 as 119909

119905 the

observation symbol probability distribution can be defined asa matrix 119861 = [119887

119895(119897)] of dimensions119873 times 119871 where

119887119895(119897) = P [119909

119905= 119883

119897| 119904

119905= 119878

119895] 1 le 119895 le 119873 1 le 119897 le 119871 (10)

Since the system in each state at each time step can emit oneof the possible 119871 symbols the matrix 119861 is stochastic that is itis constrained to sum119871

119897=1119887119895(119897) = 1 for all 1 le 119895 le 119873

Finally as in the case of HMMs we specify the initial statedistribution 120587 = 120587

119894 which defines the probability of the

starting state as

120587119894= P [119904

1= 119878

119894] 1 le 119894 le 119873 (11)

From the above considerations two different HSMMmodels can be considered In the case of continuous obser-vation 120582 = (A0

Θ 119862 120583 119880 120587) and in the case of discreteobservation the HSMM is characterized by 120582 = (A0

Θ 119861 120587)An example of continuous HSMM with 3 states is shown inFigure 1

22 Learning and Inference Algorithms Let us denote thegeneric sequence of observations being indiscriminatelycontinuous vectors or discrete symbols as x = x

1x2sdot sdot sdot x

119879 in

order to use the defined HSMM model in practice similarlyto the HMM we need to solve three basic problems

(1) Given the observation x and a model 120582 calculate theprobability that the sequence x has been generated bythe model 120582 that is P(x | 120582)

(2) Given the observation x and a model 120582 calculatethe state sequence 119878 = 119904

11199042sdot sdot sdot 119904

119879which have most

probably generated the sequence x(3) Given the observation x find the parameters of the

model 120582 which maximize P(x | 120582)

As in case of HMM solving the above problems requiresusing the forward-backward [13] decoding (Viterbi [46] andForney [47]) and ExpectationMaximization [48] algorithmswhichwill be adapted to theHSMMintroduced in Section 21In the following we also propose a more effective estimatorof the state duration variable 119889

119905(119894) defined in (2)

221 The Forward-Backward Algorithm Given a genericsequence of observations x = x

1x2sdot sdot sdot x

119879 the goal is to

calculate the model likelihood that isP(x | 120582) This quantityis useful for the training procedure where the parametersthat locally maximize the model likelihood are chosen aswell as for classification problems The latter is the case inwhich the observation sequence x has to be mapped to oneof a finite set of 119862 classes represented by a set of HSMMparameters 119871 = 120582

1 120582

119862 The class of x is chosen such

that 120582(x) = argmax120582isin119871

P(119883 | 120582)To calculate the model likelihood we first define the

forward variable at each time 119905 as

120572119905(119894) = P (x

1x2sdot sdot sdot x

119905 119904

119905= 119878

119894| 120582) 1 le 119894 le 119873 (12)

Mathematical Problems in Engineering 5

Hiddenstates

Observationprobabilities

Sojournprobabilities

Time (u) Time (u) Time (u)

Observed Observed Observed

S1 S2 S3

a12 a23

P(oS1) P(oS2) P(oS3)

d3(u)d2(u)d1(u)

Figure 1 Graphical representation of an HSMM

Contrarily toHMMs forHSMMs the state durationmustbe taken into account in the the forward variable calculationConsequently Yu [26] proposed the following inductiveformula

120572119905(119895 119889) =

119863

sum

1198891015840=1

119873

sum

119894=1

(120572119905minus1198891015840 (119894 119889

1015840) 119886

0

119894119895119901119895119895(119889

1015840)

119905

prod

119896=119905minus119889+1

119887119895(x

119896))

1 le 119895 le 119873 1 le 119905 le 119879

(13)

that is the sum of the probabilities of being in the currentstate 119878

119895(at time 119905) for the past 1198891015840 time units (with 1 le 119889

1015840le 119863

and119863 themaximumallowed duration for each state) comingfrom all the possible previous states 119878

119894 1 le 119894 le 119873 and 119894 = 119895

The disadvantage of the above formulation is that asdiscussed in Introduction the specification of the maximumduration 119863 represents a limitation to the modeling general-ization Moreover from (13) it is clear that the computationand memory complexities drastically increase with119863 whichcan be very large inmany applications in particular for onlinefailure prediction

To alleviate this problem Azimi et al [30ndash32] introduceda new forward algorithm for HSMMs that by keeping trackof the estimated average state duration at each iterationhas a computational complexity comparable to the forwardalgorithm for HMMs [13] However the average state dura-tion represents an approximation Consequently the forwardalgorithm of Azimi compared with (13) pays the priceof a lower precision in favor of a (indispensable) bettercomputational efficiency

To calculate the forward variable 120572119905(119895) using Azimirsquos

approach the duration-dependent transition matrix defined

in (8) is taken in consideration in the induction formula of(13) which becomes [30]

120572119905(119895) = [

119873

sum

119894=1

120572119905minus1

(119894) 119886119894119895(d

119905minus1)] 119887

119895(x

119905) (14)

To calculate the above formula the average state duration of(2)must be estimated for each time 119905 bymeans of the variabled119905= [119889

119905(119894)] defined as

119889119905(119894) = E (119889

119905(119894) | x

1x2sdot sdot sdot x

119905 119904

119905= 119878

119894 120582) 1 le 119894 le 119873 (15)

where E denotes the expected value To calculate the abovequantity Azimi et al [30ndash32] use the following formula

d119905= 120574

119905minus1⊙ d

119905minus1+ 1 (16)

where ⊙ represents the element by element product betweentwo matricesvectors and the vector 120574t = [120574

119905(119894)] (the

probability of being in state 119878119894at time 119905 given the observation

sequence and the model parameters) with dimensions119873 times 1

is calculated in terms of 120572119905(119894) as

120574119905(119894) = P (119904

119905= 119878

119894| x

1x2sdot sdot sdot x

119905 120582) =

120572119905(119894)

sum119873

119895=1120572119905(119895)

1 le 119894 le 119873

(17)

Equation (16) is based on the following induction formula[30ndash32] that rules the dynamics of the duration vector whenthe systemrsquos state is known

119889119905(119894) = 119904

119905minus1(119894) sdot 119889

119905minus1(119894) + 1 (18)

where for each 119905 119904119905(119894) is 1 if 119904

119905= 119878

119894 0 otherwise

6 Mathematical Problems in Engineering

A simple example shows that (18) is incorrect assum-ing an HSMM with three states and considering the statesequence (119878

1 119878

1 119878

2 ) the correct sequence of the duration

vector is d1= [1 1 1]

119879 d2= [2 1 1]

119879 and d3= [1 1 1]

119879where the superscript 119879 denotes vector transpose If we apply(18) we obtain d

1= [1 1 1]

119879 d2= [2 1 1]

119879 and d3=

[1 2 1]119879 which is in contradiction with the definition of the

state duration vector given in (2)To calculate the average state duration variable 119889

119905(119894) we

propose a new induction formula that estimates for each time119905 the time spent in the 119894th state prior to 119905 as

119889119905(119894) = P (119904

119905minus1= 119878

119894| 119904

119905= 119878

119894 x

1 x

119905) sdot (119889

119905minus1(119894) + 1) (19)

=119886119894119894(d

119905minus1) sdot 120572

119905minus1(119894) sdot 119887

119894(x

119905)

120572119905(119894)

sdot (119889119905minus1

(119894) + 1)

1 le 119894 le 119873

(20)

The derivation of (20) is given in Appendix The intuitionbehind (19) is that the current average duration is the previousaverage duration plus one weighted with the ldquoamountrdquo of thecurrent state that was already in state 119878

119894in the previous step

Using the proposed (20) the forward algorithm can bespecified as follows

(1) initialization with 1 le 119894 le 119873

1205721(119894) = 120587

119894119887119894(x

1)

1198891(119894) = 1

Ad1

= P (d1) + (I minus P (d

1))A0

(21)

where P(d119894) is estimated using (6)

(2) induction with 1 le 119895 le 119873 and 1 le 119905 le 119879 minus 1

120572119905+1

(119895) = [

119873

sum

119894=1

120572119905(119894) 119886

119894119895(d

119905)] 119887

119895(x

119905+1) (22)

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(x

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (23)

Ad119905+1

= P (d119905+1

) + (I minus P (d119905+1

))A0 (24)

where 119886119894119895(d

119905) are the coefficients of the matrix Ad

119905

(3) termination

P (x | 120582) =119873

sum

119894=1

120572119879(119894) (25)

Similar considerations as the forward procedure can bemade for the backward algorithm which is implemented bydefining the variable 120573

119905(119894) as

120573119905(119894) = P (x

119905+1x119905+2

sdot sdot sdot x119879| 119904

119905= 119878

119894 120582) 1 le 119894 le 119873 (26)

Having estimated the dynamic transition matrix Ad119905

foreach 1 le 119905 le 119879 using (24) the backward variable can becalculated inductively as follows

(1) Initialization

120573119879(119894) = 1 1 le 119894 le 119873 (27)

(2) Induction

120573119905(119894) =

119873

sum

119895=1

119886119894119895(d

119905) 119887

119895(x

119905+1) 120573

119905+1(119895)

119905 = 119879 minus 1 119879 minus 2 1 1 le 119894 le 119873

(28)

Although the variable 120573119905(119894) is not necessary for the

calculation of the model likelihood it will be useful in theparameter reestimation procedure as it will be explained inSection 223

222 The Viterbi Algorithm The Viterbi algorithm [46 47](also known as decoding) allows determining the best statesequence corresponding to a given observation sequence

Formally given a sequence of observation x = x1x2sdot sdot sdot x

119879

the best state sequence 119878lowast = 119904lowast

1119904lowast

2sdot sdot sdot 119904

lowast

119879corresponding to x is

calculated by defining the variable 120575119905(119894) as

120575119905(119894) = max

11990411199042119904119905minus1

P (11990411199042 119904

119905= 119878

119894 x

1x2sdot sdot sdot x

119905| 120582) (29)

The procedure to recursively calculate the variable 120575119905(119894)

and to retrieve the target state sequence (ie the argumentswhich maximize the 120575

119905(119894)rsquos) for the proposed HSMM is a

straightforward extension of theViterbi algorithm forHMMs[13]The only change is the usage in the recursive calculationof 120575

119905(119894) of the dynamic transition matrix Ad

119905

= [119886119894119895(d

119905)]

calculated through (24) The Viterbi algorithm for the intro-duced parametric HSMMs can be summarized as follows

(1) initialization with 1 le 119894 le 119873

1205751(119894) = 120587

119894119887119894(x

1)

1205951(119894) = 0

(30)

(2) recursion with 1 le 119895 le 119873 and 2 le 119905 le 119879

120575119905(119895) = max

1le119894le119873

[120575119905minus1

(119894) 119886119894119895(d

119905)] 119887

119895(x

119905) (31)

120595119905(119895) = argmax

1le119894le119873

[120575119905minus1

(119894) 119886119894119895(d

119905)] (32)

(3) termination

119875lowast= max

1le119894le119873

[120575119879(119894)] (33)

119904lowast

119879= argmax

1le119894le119873

[120575119879(119894)] (34)

where we keep track of the argument maximizing (31) usingthe vector 120595

119905 which tracked back gives the desired best

state sequence

119904lowast

119905= 120595

119905+1(119904

lowast

119905+1) 119905 = 119879 minus 1 119879 minus 2 1 (35)

Mathematical Problems in Engineering 7

223 The Training Algorithm The training algorithm con-sists of estimating themodel parameters from the observationdata As discussed in Section 21 a parametric HSMMis defined by 120582 = (A0

Θ 119862 120583 119880 120587) if the observationsare continuous or 120582 = (A0

Θ 119861 120587) if the observationsare discrete Given a generic observation sequence x =

x1x2sdot sdot sdot x

119879 referred to as training set in the following the

training procedure consists of finding the model parameterset120582lowast which locallymaximizes themodel likelihoodP(x | 120582)

We use the modified Baum-Welch algorithm of Azimiet al [30ndash32] However in our implementation we do notmake assumption on the density function used to model thestate duration and we consider both continuous and discreteobservations

Being a variant of the more general Expectation-Max-imization (EM) algorithm Baum-Welch is an iterative proce-dure which consists of two steps (i) the expectation step inwhich the forward and backward variables are calculated andthe model likelihood is estimated and (ii) the maximizationstep in which the model parameters are updated and usedin the next iteration This process usually starts from arandom guess of the model parameters 1205820 and it is iterateduntil the likelihood function does not improve between twoconsecutive iterations

Similarly to HMMs the reestimation formulas arederived by firstly introducing the variable 120585

119905(119894 119895) which

represents the probability of being in state 119878119894at time 119905 and

in state 119878119895at time 119905 + 1 given the model and the observation

sequence as

120585119905(119894 119895) = P (119904

119905= 119878

119894 119904

119905+1= 119878

119895| x 120582) (36)

However in the HSMM case the variable 120585119905(119894 119895) considers

the duration estimation performed in the forward algorithm(see Equation (24)) Formulated in terms of the forward andbackward variables it is given by

120585119905(119894 119895) = P (119904

119905= 119878

119894 119904

119905+1= 119878

119895| x 120582)

=P (119904

119905= 119878

119894 119904

119905+1= 119878

119895 x | 120582)

P (x | 120582)

=120572119905(119894) 119886

119894119895(d

119905) 119887

119895(x

119905+1) 120573

119905+1(119895)

P (x | 120582)

=120572119905(119894) 119886

119894119895(d

119905) 119887

119895(x

119905+1) 120573

119905+1(119895)

sum119873

119894=1sum

119873

119895=1120572119905(119894) 119886

119894119895(d

119905) 119887

119895(x

119905+1) 120573

119905+1(119895)

(37)

From 120585119905(119894 119895) we can derive the quantity 120574

119905(119894) (already

defined in (17)) representing the probability of being in state119878119894at time 119905 given the observation sequence and the model

parameters

120574119905(119894) =

119873

sum

119895=1

120585119905(119894 119895) (38)

Finally the the reestimation formulas for the parameters120587 and A0 are given by

120587119894= 120574

1(119894) (39)

1198860

119894119895=

(sum119879minus1

119905=1120585119905(119894 119895)) ⊙ 119866

sum119873

119895=1(sum

119879minus1

119905=1120585119905(119894 119895)) ⊙ 119866

(40)

where119866 = [119892119894119895] is a squarematrix of dimensions119873times119873where

119892119894119895= 0 for 119894 = 119895 and 119892

119894119895= 1 for 119894 = 119895 ⊙ represents the element

by element product between two matrices sum119879minus1

119905=1120574119905(119894) is the

expected number of transitions from state 119878119894 andsum119879minus1

119905=1120585119905(119894 119895)

is the expected number of transitions from state 119878119894to state 119878

119895

Equation (39) represents the expected number of timesthat the model starts in state 119878

119894 while (40) represents the

expected number of transitions from state 119878119894to state 119878

119895with

119894 = 119895 over the total expected number of transitions from state119878119894to any other state different from 119878

119894

For the matrix A0 being normalized the stochasticconstraints are satisfied at each iteration that is sum119873

119895=11198860

119894119895= 1

for each 1 le 119894 le 119873 while the estimation of the priorprobability 120587

119894inherently sums up to 1 at each iteration since

it represents the expected frequency in state 119878119894at time 119905 = 1

for each 1 le 119894 le 119873With respect to the reestimation of the state duration

parameters Θ firstly we estimate the mean 120583119894119889

and thevariance 1205902

119894119889of the 119894th state duration for each 1 le 119894 le 119873

from the forward and backward variables and the estimationof the state duration variable

120583119894119889

=sum

119879minus1

119905=1120572119905(119894) (sum

119873

119895=1119895 =119894119886119894119895(119889

119905(119894)) 119887

119895(x

119905+1) 120573

119905+1(119895)) 119889

119905(119894)

sum119879minus1

119905=1120572119905(119894) (sum

119873

119895=1119895 =119894119886119894119895(119889

119905(119894)) 119887

119895(x

119905+1) 120573

119905+1(119895))

(41)

1205902

119894119889= (

119879minus1

sum

119905=1

120572119905(119894)(

119873

sum

119895=1119895 =119894

119886119894119895(119889

119905(119894)) 119887

119895(x

119905+1) 120573

119905+1(119895))

sdot (119889119905(119894) minus 120583

119894119889)2

)

sdot (

119879minus1

sum

119905=1

120572119905(119894)(

119873

sum

119895=1119895 =119894

119886119894119895(119889

119905(119894)) 119887

119895(x

119905+1) 120573

119905+1(119895)))

minus1

(42)

where (41) can be interpreted as the probability of transitionfrom state 119878

119894to 119878

119895with 119894 = 119895 at time 119905weighted by the duration

of state 119878119894at 119905 giving the desired expected value while in (42)

the same quantity is weighted by the squared distance of theduration at time 119905 from its mean giving the estimation of thevariance

Then the parameters of the desired duration distributioncan be estimated from 120583

119894119889and 1205902

119894119889 For example if a Gamma

distribution with shape parameter ] and scale parameter 120578 ischosen to model the state duration the parameters ]

119894and 120578

119894

for each 1 le 119894 le 119873 can be calculated as ]119894= 120583

2

119894119889120590

2

119894119889and

120578119894= 120590

2

119894119889120583

119894119889

8 Mathematical Problems in Engineering

Concerning the observation parameters once the modi-fied forward and backward variables accounting for the stateduration are defined as in (22) and (28) the reestimationformulas are the same as for Hidden Markov Models [13]

In particular for continuous observations the parametersof the Gaussiansrsquo mixture defined in (9) are reestimated byfirstly defining the probability of being in state 119878

119895at time 119905

with the probability of the observation vector x119905evaluated by

the 119896th mixture component as

120574119905(119895 119896) = [

[

120572119905(119895) 120573

119905(119895)

sum119873

119895=1120572119905(119895) 120573

119905(119895)

]

]

sdot [

[

119888119895119896N (x

119905120583

119895119896U

119895119896)

sum119872

119898=1119888119895119898N (x

119905120583

119895119898U

119895119898)

]

]

(43)

By using the former quantity the parameters 119888119895119896 120583

119895119896 andU

119895119896

are reestimated through the following formulas

119888119895119896=

sum119879

119905=1120574119905(119895 119896)

sum119879

119905=1sum

119872

119898=1120574119905(119895 119898)

120583119895119896=sum

119879

119905=1120574119905(119895 119896) sdot x

119905

sum119879

119905=1120574119905(119895 119896)

U119895119896=sum

119879

119905=1120574119905(119895 119896) sdot (x

119905minus 120583

119895119896) (x

119905minus 120583

119895119896)119879

sum119879

119905=1120574119905(119895 119896)

(44)

where superscript 119879 denotes vector transposeFor discrete observations the reestimation formula for

the observation matrix 119887119895(119897) is

119887119895(119897) =

sum119879

119905=1with119909119905=119883119897

120574119905(119895)

sum119879

119905=1120574119905(119895)

(45)

where the quantity 120574119905(119895) which takes into account the dura-

tion dependent forward variable 120572119905(119895) is calculated through

(17)The reader is referred to Rabinerrsquos work [13] for the inter-

pretation on the observation parameters reestimation formu-las

3 AIC-Based Model Selection

In the framework of the proposed parametric HSMMs themodel selection procedure aims to select the optimal numberof hidden states119873 the right duration distribution family andin the case of mixture observation modeling the number ofGaussian mixtures 119872 to be used In this work we make useof the Akaike Information Criterion (AIC) Indeed it hasbeen seen that in case of complex models and in presence ofa limited number of training observations AIC represents asatisfactorymethodology formodel selection outperformingother approaches like Bayesian Information Criterion

In general information criteria are represented as a two-term structure They account for a compromise between

a measure of model fitness which is based on the likelihoodof the model and a penalty term which takes into accountthe model complexity Usually the model complexity ismeasured in terms of the number of parameters that have tobe estimated and in terms of the number of observations

The Akaike Information Criterion is an estimate of theasymptotic value of the expected distance between theunknown true likelihood function of the data and the fittedlikelihood function of the model In particular the AIC canbe expressed as

AIC =minus log 119871 () + 119901

119879 (46)

where 119871() is the likelihood of the model with the estimatedparameters as defined in (25) 119901 is the number of modelparameters and119879 is the length of the observed sequenceThebest model is the one minimizing equation (46)

Concerning 119901 the number of parameters to be estimatedfor a parametric HSMM with119873 states is 119901 = 119901

ℎ+ 119901

119900 where

119901ℎare the parameters of the hidden states layer while 119901

119900are

those of the observation layerIn particular 119901

ℎ= (119873 minus 1) + (119873 minus 1) sdot 119873 + 119911 sdot 119873 where

(i) 119873 minus 1 accounts for the prior probabilities 120587(ii) (119873 minus 1) sdot 119873 accounts for the nonrecurrent transition

matrix A0(iii) 119911sdot119873 accounts for the duration probability being 119911 the

number of parameters 120579 of the duration distribution

Concerning 119901119900 a distinction must be made between

discrete and continuous observations

(i) in the case of discrete observations with 119871 possibleobservable values 119901

119900= (119871 minus 1) sdot 119873 which accounts

for the elements of the observation matrix 119861(ii) if the observations are continuous and a multivariate

mixture of 119872 Gaussians with 119874 variates is used asobservationmodel 119901

119900= [119874 sdot119873sdot119872]+[119874 sdot119874 sdot119873sdot119872]+

[(119872minus 1) sdot 119873] where each term accounts respectivelyfor the mean vector 120583 the covariance matrix 119880 andthe mixture coefficients 119862

4 Remaining Useful Lifetime Estimation

One of the most important advantages of the time modelingof HSMMs is the possibility to effectively face the predictionproblem The knowledge of the state duration distributionsallows the estimation of the remaining time in a certain stateand in general the prediction of the expected time119863 beforeentering in a determinate state

As already mentioned an interesting application of theprediction problem is the Remaining Useful Lifetime (RUL)estimation of industrial equipments Indeed if each stateof an HSMM is mapped to a different condition of anindustrial machine and if the state 119878

119896that represents the

failure condition is identified at each moment the RUL canbe defined as the expected time 119863 to reach the failure state

Mathematical Problems in Engineering 9

119878119896 If we assume that the time to failure is a random variable

119863 following a determinate probability density we define theRUL at the current time 119905 as

RUL119905= 119863 = E (119863) 119904

119905+= 119878

119896 119904

119905+minus1= 119878

119894

1 le 119894 119896 le 119873 119894 = 119896

(47)

where E denotes the expected valueHaving fixed the failure state the estimation of the RUL

is performed in two steps every time a new observation isacquired (online)

(1) estimation of the current state(2) projection of the future state transitions until the

failure state is reached and estimation of the expectedsojourn time

The estimation of the current state is performed via theViterbi path that is the variable 120575

119905= [120575

119905(119894)]

1le119894le119873defined in

(29) To correctly model the uncertainty of the current stateestimation we use the normalized variable 120575

119905(119894) obtained as

120575119905(119894) = max

11990411199042119904119905minus1

P (119904119905= 119878

119894| 119904

11199042sdot sdot sdot 119904

119905minus1 x

1x2sdot sdot sdot x

119905 120582)

=120575119905(119894)

sum119873

119895=1120575119905(119895)

1 le 119894 le 119873

(48)

that is an estimate of the probability of being in state 119878119894at

time 119905Together with the normalized variable 120575

119905(119894) the maxi-

mum a posteriori estimate of the current state 119904lowast

119905is taken

into account according to (34) If 119904lowast119905coincides with the failure

state the desired event is detected by the model and the timeto this event is obviously zero Otherwise an estimation ofthe average remaining time in the current state 119889avg(119904

lowast

119905) is

calculated as

119889avg (119904lowast

119905) =

119873

sum

119894=1

(120583119889119894

minus 119889119905(119894)) ⊙ 120575

119905(119894) (49)

where with 120583119889119894

we denote the expected value of the durationvariable in state 119878

119894according to the duration distribution

specified by the parameters 120579119894 Equation (49) thus estimates

the remaining time in the current state by subtracting theestimated states duration 119889

119905(119894) at time 119905 from the expected

sojourn time of state 119878119894 and weighting the result using the

uncertainty about the current state 120575119905(119894) and finally by

summing up all the contributions from each stateIn addition to the average remaining time a lower and an

upper bound value can be calculated based on the standarddeviation 120590

119889119894

of the duration distribution for state 119878119894

119889low (119904lowast

119905) =

119873

sum

119894=1

(120583119889119894

minus 120590119889119894

minus 119889119905(119894)) ⊙ 120575

119905(119894) (50)

119889up (119904lowast

119905) =

119873

sum

119894=1

(120583119889119894

+ 120590119889119894

minus 119889119905(119894)) ⊙ 120575

119905(119894) (51)

Once the remaining time in the current state is estimatedthe probability of the next state is calculated by multiplyingthe transpose of the nonrecurrent transition matrix by thecurrent state probability estimation as follows

120575next = [120575119905+119889

(119894)]1le119894le119873

= (A0)119879

sdot 120575119905 (52)

while themaximumaposteriori estimate of the next state 119904lowastnextis calculated as

119904lowast

next = 119904lowast

119905+119889= argmax

1le119894le119873

120575119905+119889

(119894) (53)

Again if 119904lowast119905+119889

coincides with the failure state the failurewill happen after the remaining time at the current state isover and the average estimation of the failure time is 119863avg =

119889avg(119904lowast

119905) calculated at the previous step with the bound values

119863low = 119889low(119904lowast

119905) and119863up = 119889up(119904

lowast

119905) Otherwise the estimation

of the sojourn time of the next state is calculated as follows

119889avg (119904lowast

119905+119889) =

119873

sum

119894=1

120583119889119894

⊙ 120575119905+119889

(119894) (54)

119889low (119904lowast

119905+119889) =

119873

sum

119894=1

(120583119889119894

minus 120590119889119894

) ⊙ 120575119905+119889

(119894) (55)

119889up (119904lowast

119905+119889) =

119873

sum

119894=1

(120583119889119894

+ 120590119889119894

) ⊙ 120575119905+119889

(119894) (56)

This procedure is repeated until the failure state isencountered in the prediction of the next state The calcula-tion of the RUL is then simply obtained by summing all theestimated remaining time in each intermediate state beforeencountering the failure state

119863avg = sum119889avg (57)

119863low = sum119889low (58)

119863up = sum119889up (59)

Finally Algorithm 1 details the above described RUL esti-mation procedure

5 Experimental Results

To demonstrate the effectiveness of the proposed HSMMmodels we make use of a series of experiments performedboth on simulated and real data

The simulated data were generated by considering a left-right HSMM and adapting the parameters of the artificialexample reported in the work of Lee et al [15] The real casedata are monitoring data related to the entire operationallife of bearings made available for the IEEE PHM 2012 datachallenge (httpwwwfemto-stfrenResearch-departmentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-chal-lengephp)

10 Mathematical Problems in Engineering

(1) function RulEstimation(x119905 119878

119896) ⊳ x

119905 The last observation acquired

(2) ⊳ 119878119896 The failure state

(3) Initialization(4) 119863avg larr 0

(5) 119863low larr 0

(6) 119863up larr 0

(7) Current state estimation(8) Calculate 120575

119905⊳ Using (48)

(9) Calculate 119904lowast119905

⊳ Using (34)(10) Calculate d

119905⊳ Using (20)

(11) 119878 larr 119904lowast

119905

(12) Loop(13) while 119878 = 119878

119896do

(14) Calculate 119889avg ⊳ Using (49) or (54)(15) Calculate 119889low ⊳ Using (50) or (55)(16) Calculate 119889up ⊳ Using (51) or (56)(17) 119863avg larr 119863avg + 119889avg

(18) 119863low larr 119863low + 119889low(19) 119863up larr 119863up + 119889up(20) Calculate 120575next ⊳ Using (52)(21) Calculate 119904lowastnext ⊳ Using (53)(22) 119878 larr 119904

lowast

nextend

(23) return 119863avg119863low119863up

Algorithm 1 Remaining Useful Lifetime estimation (Pseudo-Code)

51 Simulated Experiment Data have been generated withthe idea of simulating the behavior of an industrial machinethat during its functioning experiences several degradationmodalities until a failure state is reached at the end of itslifetimeThe generated data are used to test the performancesof our methodology for (i) automatic model selection (ii)online condition monitoring and (iii) Remaining UsefulLifetime estimation considering both continuous and dis-crete observations

511 Data Generation The industrial machine subject ofthese experiments has been modeled as a left-right paramet-ric HSMM with 119873 = 5 states having state 119878

5as absorbing

(failure) stateThe choice of a left-right setting has beenmadefor simplicity reasons since the primary goal of this work is todemonstrate that the proposed model specification coupledwith the Akaike Information Criterion is effective to solveautomatic model selection online condition monitoringand prediction problems At this purpose we divided theexperiments in two cases according to the nature of theobservation being continuous or discrete

For each of the continuous and the discrete cases threedata sets have been generated by considering the followingduration models Gaussian Gamma and Weibull densitiesFor each of the three data sets 30 series of data are usedas training set and 10 series as testing set Each time seriescontains 119879 = 650 observations The parameters used togenerate the data are taken from the work of Lee et al [15]

and are adapted to obtain an equivalent left-right parametricHSMM as follows

120587 =

[[[[[

[

1

0

0

0

0

]]]]]

]

A0=

[[[[[

[

0 1 0 0 0

0 0 1 0 0

0 0 0 1 0

0 0 0 0 1

0 0 0 0 1

]]]]]

]

ΘN = 1205791= [100 20] 120579

2= [90 15] 120579

3= [100 20]

1205794= [80 25] 120579

5= [200 1]

ΘG = 1205791= [500 02] 120579

2= [540 01667]

1205793= [500 02] 120579

4= [256 03125]

1205795= [800 0005]

ΘW = 1205791= [102 28] 120579

2= [92 29] 120579

3= [102 28]

1205794= [82 20] 120579

5= [200 256]

(60)

whereΘNΘG andΘW are the different distribution param-eters used for the Gaussian Gamma and Weibull dura-tion models respectively More precisely they represent thevalues of the mean 120583

119889and the the variance 120590

2

119889of the

Gaussian distribution the shape ]119889and the scale 120578

119889of the

Gamma distribution and the scale 119886119889and the shape 119887

119889of

Mathematical Problems in Engineering 11

0 100 200 300 400 500 600

12345

Hidden states sequence

0 100 200 300 400 500 6000

100

200

300 State duration

0 100 200 300 400 500 6000

20

40

60Observed signal

(a) Example of simulated data for the continuous case

0 100 200 300 400 500 600

12345

Hidden states sequence

0 100 200 300 400 500 6000

100

200

300 State duration

0 100 200 300 400 500 600

132

54

76

Observed symbols

(b) Example of simulated data for the discrete case

Figure 2 The data generated with the parameter described in Section 511 both for the continuous and the discrete case

theWeibull distribution It must be noticed that as explainedin Section 21 for state 119878

5 being the absorbing state the

duration parameters 1205795have no influence on the data since

once the state 1198785is reached the system will remain there

foreverConcerning the continuous observation modeling a

bivariate Gaussian distribution has been used with the fol-lowing parameters [15]

1205831= [

20

20] 120583

2= [

20

35] 120583

3= [

35

35]

1205835= [

28

28]

1198801= [

20 0

0 20] 119880

2= [

15 0

0 15] 119880

3= [

15 minus2

minus2 15]

1198804= [

5 0

0 5] 119880

5= [

10 3

3 10]

(61)

while for the discrete case 119871 = 7 distinct observation sym-bols have been taken into consideration with the followingobservation probability distribution

119861 =

[[[[[

[

08 02 0 0 0 0 0

01 08 01 0 0 0 0

0 01 08 01 0 0 0

0 0 01 07 01 01 0

0 0 0 02 06 01 01

]]]]]

]

(62)

An example of simulated data both for the continuous and thediscrete cases is shown in Figure 2 where aGaussian durationmodel has been used

512 Training and Model Selection The goal of this experi-mental phase is to test the effectiveness of the AIC in solvingthe automatic model selection For this purpose the trainingsets of the 6 data sets (continuousdiscrte observation withGaussian Gamma and Weibull duration models) have beentaken individually and for each one of them a series of

learning procedure has been run each one with a variableHSMM structure In particular we took into account all thecombinations of the duration distribution families (GaussianGamma and Weibull) an increasing number of states 119873from 2 to 8 and for the continuous observation cases anincreasing number of Gaussian mixtures119872 in the observa-tion distribution from 1 to 4

As accurate parameter initialization is crucial for obtain-ing a good model fitting [14] a series of 40 learning proce-dures corresponding to 40 random initializations of the initialparameters 1205820 have been executed for each of the consideredHSMM structures For each model structure the AIC valueas defined in (46) has been evaluatedThe final trained set ofparameters 120582lowast corresponding to the minimumAIC value hasbeen retained resulting in 7 HSMMs with a number of statesfrom 2 to 8

The obtained results are shown in Figure 3 for boththe continuous and discrete observation data As it can benoticed for all the 6 test cases of Figure 3 the AIC valuesdo not improve much for a number of states higher than 5meaning that adding more states does not add considerableinformation to the HSMM modeling power Hence 5 statescan be considered as an optimal number of states Moreoveras shown in the zoomed sections of Figure 3 for the HSMMswith 5 states the minimum AIC values are obtained for theduration distributions corresponding to the ones used togenerate the data As a consequence AIC can be considered asan effective approach to performmodel selection for HSMMas well as selecting the appropriate parametric distributionfamily for the state duration modeling

513 Condition Monitoring The optimal parameters 120582lowast

obtained in the previous phase have been used to performthe online conditionmonitoring experiment on the 10 testingcases for all the 6 considered HSMM configurations In thisexperiment we simulate online monitoring by consideringall the testing observations up to the current time thatis x

1x2sdot sdot sdot x

119905 Each time a new data point is acquired

the Viterbi algorithm is used to estimate the current state119904lowast

119905= argmax

1le119894le119873[120575

119905(119894)] as specified in (34)

12 Mathematical Problems in Engineering

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(a) AIC values for continuous data and Gaussian durationdistribution

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(b) AIC values for continuous data and Gamma duration distri-bution

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(c) AIC values for continuous data and Weibull duration distri-bution

2 3 4 5 6 7 8Number of states

0

02

04

06

08

1

12

14

(d) AIC values for discrete data and Gaussian duration distribu-tion

2 3 4 5 6 7 8Number of states

0

02

04

06

08

1

12

14

GaussianGamma

Weibull

(e) AIC values for discrete data andGammaduration distribution

2 3 4 5 6 7 8Number of states

GaussianGamma

Weibull

0

02

04

06

08

1

12

14

(f) AIC values for discrete data andWeibull duration distribution

Figure 3 Akaike Information Criterion (AIC) applied to continuous and discrete observations data AIC is effective for automatic modelselection since its minimum value provides the same number of states and duration model used to generate the data

An example of execution of the condition monitoringexperiment is shown in Figure 4 for both continuous anddiscrete observations respectively In Figure 4(a) the stateduration has beenmodeledwith aGammadistribution whilein Figure 4(b) with a Gaussian distribution In Figures 4(a)and 4(b) the first display represents the true state of the

HSMM and the second display represents the estimated statefrom the Viterbi algorithm while the third display representsthe observed time series

Knowing the true state sequence we calculated theaccuracy defined as the percentage of correctly estimatedstates over the total length of the state sequence for each of

Mathematical Problems in Engineering 13

0 100 200 300 400 500 600 70012345

True state sequence

0 100 200 300 400 500 600 70012345

Viterbi path accuracy 0985

0 100 200 300 400 500 600 7000

204060 Observations

Correct guessWrong guess

(a) State estimation with Viterbi path for continuous data andGamma duration distribution

0 100 200 300 400 500 600 70012345

True state sequence

0 100 200 300 400 500 600 70012345

Viterbi path accuracy 0992

Correct guessWrong guess

0 100 200 300 400 500 600 700246

Observations

(b) State estimation with Viterbi path for discrete data andGaussian duration distribution

Figure 4 Condition monitoring using the Viterbi path HSMMs can be effective to solve condition monitoring problems in time-dependentapplications due to their high accuracy in hidden state recognition

the testing casesThe results are summarized in Table 1(a) forthe continuous observations and in Table 1(b) for the discreteobservations The high percentage of correct classified statesshows that HSMMs can be effectively used to solve conditionmonitoring problems for applications in which the systemshows a strong time and state duration dependency

514 Remaining Useful Lifetime Estimation In this experi-mental phasewe considered the state 119878

5as the failure state and

the trained parameters 120582lowast of Section 512 for each HSMMconfiguration As the online RUL estimation procedure isintended to be used in real time we simulated conditionmonitoring experiment where we progressively consider theobservations x

1x2sdot sdot sdot x

119905 up to time 119905 When a new obser-

vation is acquired after the current state probability 120575119905(119894)

is estimated (Equation (48)) the calculation of the averageupper and lower RUL ((57) (58) and (59)) is performed

Examples of RUL estimation are illustrated in Figure 5In particular Figure 5(a) represents the case of continuousobservations and durationmodeled by aWeibull distributionwhile Figure 5(b) shows the case of discrete observationsand duration modeled by a Gamma distribution From thefigures one can notice that the average as well as the lowerand the upper bound estimations converges to the real RULas the failure time approaches Moreover as expected theuncertainty about the estimation decreases with the timesince predictions performed at an early stage are moreimprecise As a consequence the upper and the lower boundbecome more narrow as the failure state approaches and theestimation becomes more precise until it converges to theactual RUL value with the prediction error tending to zeroat the end of the evaluation

To quantitatively estimate the performance of our meth-odology for the RUL estimation we considered at each time119905 the absolute prediction error (APE) between the real RULand the predicted value defined as

APE (119905) = 1003816100381610038161003816RULreal (119905) minus RUL (119905)1003816100381610038161003816 (63)

where RULreal(119905) is the (known) value of the RUL at time119905 while RUL(119905) is RUL predicted by the model To evaluatethe overall performance of our methodology we consideredthe average absolute prediction error of the RUL estimationdefined as

APE =sum

119879

119905=1APE (119905)119879

(64)

where 119879 is the length of the testing signal APE being aprediction error average values of (64) close to zero corre-spond to good predictive performances

The results for each of the 10 testing cases and the differentHSMM configurations are reported in Tables 2(a) and 2(b)for the continuous and the discrete observation cases respec-tively As it can be noticed the online Remaining UsefulLifetime estimation and in general the online prediction ofthe time to a certain event can be effectively faced withHSMMs which achieve a reliable estimation power with asmall prediction error

Finally we tested our RUL estimationmethodology usingthe state duration estimation of (16) introduced byAzimi et al[30ndash32] The results are shown in Tables 3(a) and 3(b)in which respectively the prediction errors obtained forcontinuous and discrete observations are reported

Comparing Table 2 and Table 3 one can notice that theproposed RUL method outperforms the one of Azimi This

14 Mathematical Problems in Engineering

0 50 100 150 200 250 300 3500

50

100

150

200

250

300

350

400

Time

True RULUpper RUL

Average RULLower RUL

(a) Remaining Useful Lifetime estimation for continuous data andWeibull duration distribution

0 50 100 150 200 250 300 3500

50

100

150

200

250

300

350

400

Time

True RULUpper RUL

Average RULLower RUL

(b) Remaining Useful Lifetime estimation for discrete data andGamma duration distribution

Figure 5 HSMMs effectively solve RUL estimation problemsThe prediction converges to the actual RUL value and its uncertainty decreasesas the real failure time approaches

Table 1 State recognition accuracy

(a) Continuous observations

Test case Duration distributionGaussian Gamma Weibull

1 994 985 9922 997 986 9953 994 992 9974 989 989 9975 982 989 1006 991 988 9977 985 994 9978 992 991 9959 992 986 99710 992 991 995Average 991 989 996

(b) Discrete observations

Test case Duration distributionGaussian Gamma Weibull

1 974 967 9742 972 976 9653 994 958 9664 982 953 9775 991 974 9756 978 977 9787 958 972 9668 977 964 9729 989 972 98510 992 956 969Average 981 967 973

is mainly due to the proposed average state duration of (20)compared to the one of Azimi given by (16)

52 Real Data In this sectionwe apply the proposedHSMM-based approach for RUL estimation to a real case studyusing bearing monitoring data recorded during experimentscarried out on the Pronostia experimental platform andmadeavailable for the IEEE Prognostics and Health Management(PHM) 2012 challenge [49]The data correspond to normallydegraded bearings leading to cases which closely correspondto the industrial reality

The choice of testing the proposed methodology onbearings derives from two facts (i) bearings are the mostcritical components related to failures of rotating machines[50] and (ii) their monotonically increasing degradationpattern justifies the usage of left-right HSMMmodels

521 Data Description Pronostia is an experimental plat-form designed and realized at the Automatic Controland Micro-Mechatronic Systems (AS2M) Department ofFranche-Comte Electronics Mechanics Thermal Process-ing Optics-Science and Technology (FEMTO-ST) Institute(httpwwwfemto-stfr) (Besancon France) with the aimof collecting real data related to accelerated degradation ofbearings Such data are used to validate methods for bearingcondition assessment diagnostic and prognostic [19 51ndash59]

The Pronostia platform allows to perform run-to-failuretests under constant or variable operating conditions Theoperating conditions are determined by two factors thatcan be controlled online (i) the rotation speed and (ii)the radial force load During each experiment tempera-ture and vibration monitoring measurements are gatheredonline through two type of sensors placed in the bearing

Mathematical Problems in Engineering 15

Table 2 Average absolute prediction error (APE) of the RUL esti-mation using the proposed state duration estimator of (20)

(a) APE of the RUL estimation for the continuous observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 51 170 67 140 290 091 45 170 812 76 190 50 61 210 85 66 190 613 77 54 190 29 120 170 160 290 304 90 210 29 75 220 68 60 190 675 73 190 47 22 140 140 39 170 876 65 180 56 51 180 100 140 270 277 47 160 75 48 170 110 12 130 1208 100 220 29 52 180 100 92 220 399 31 92 140 20 160 130 82 210 4910 64 180 56 75 220 69 33 120 130Average 68 170 74 57 190 99 73 200 70

(b) APE of the RUL estimation for the discrete observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 21 110 140 31 88 140 24 120 1302 21 110 130 110 220 33 190 320 713 51 170 76 66 180 51 23 140 1104 59 65 180 52 170 67 42 160 905 32 140 100 83 190 34 120 240 296 120 240 27 62 180 52 41 84 1607 29 150 97 93 210 23 190 310 668 150 270 70 74 180 43 43 170 949 59 180 77 110 230 55 39 160 8810 35 110 140 55 60 160 52 170 71Average 57 150 100 74 170 66 77 190 90

housing a temperature probe and two accelerometers (oneon the vertical and one on the horizontal axis)

The platform is composed of three main parts a rotatingpart a load profile generation part and a measurement partas illustrated in Figure 6

The rotating part is composed of an asynchronous motorwhich develops a power equal to 250W two shafts and agearbox which allows the motor to reach its rated speed of2830 rpm The motorrsquos rotation speed and the direction areset through a human machine interface

The load profiles part issues a radial force on the externalring of the test bearing through a pneumatic jack connectedto a lever arm which indirectly transmits the load througha clamping ring The goal of the applied radial force is toaccelerate the bearingrsquos degradation process

Table 3 Average absolute prediction error (APE) of the RUL esti-mation using the state duration estimator of (16) introduced byAzimi et al [30ndash32]

(a) APE of the RUL estimation for the continuous observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 578 510 668 262 97 527 259 284 6462 502 444 577 213 170 469 290 192 7083 503 447 573 271 87 565 345 139 7344 518 460 604 213 143 459 349 171 7875 594 537 662 290 95 554 334 156 7496 580 517 671 258 83 541 231 258 6657 594 536 669 182 125 477 360 171 7448 634 556 723 194 157 441 348 178 7709 491 435 570 145 171 432 251 267 67010 544 484 628 232 79 527 241 245 674Average 554 493 635 226 121 499 301 206 715

(b) APE of the RUL estimation for the discrete observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 514 410 624 424 318 530 326 264 7362 496 399 604 595 483 708 313 276 6933 502 386 623 465 357 574 324 257 7024 422 315 538 501 405 606 237 361 6035 443 339 558 478 374 591 360 256 7656 522 432 627 552 443 669 272 316 6437 550 439 668 560 457 670 347 232 7448 503 390 620 604 505 710 351 264 7249 555 474 640 480 372 595 318 222 73610 490 382 607 521 412 631 294 289 687Average 500 397 611 518 413 629 314 274 704

Themeasurement part consists of a data acquisition cardconnected to the monitoring sensors which provides theuser with the measured temperature and vibration data Thevibration measurements are provided in snapshots of 01 scollected each 10 seconds at a sampling frequency of 256 kHz(2560 samples per each snapshot) while the temperature hasbeen continuously recorded at a sampling frequency of 10Hz(600 samples collected each minute)

Further details on the Pronostia test rig can be found onthe data presentation paper [49] and on the web page of thedata challenge (httpwwwfemto-stfrenResearch-depart-mentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-challengephp)

16 Mathematical Problems in Engineering

Table 4 Lifetime duration (in seconds) and operating conditions of the bearings tested in the IEEE PHM 2012 Prognostic Challenge dataset[49]

Condition 1 Condition 2 Condition 31800 rpm and 4000N 1650 rpm and 4200N 1500 rpm and 5000N

Bearing Lifetime [s] Bearing Lifetime [s] Bearing Lifetime [s]Bearing1 1 28030 Bearing2 1 9110 Bearing3 1 5150Bearing1 2 8710 Bearing2 2 7970 Bearing3 2 16370Bearing1 3 23750 Bearing2 3 19550 Bearing3 3 4340Bearing1 4 14280 Bearing2 4 7510Bearing1 5 24630 Bearing2 5 23110Bearing1 6 24480 Bearing2 6 7010Bearing1 7 22590 Bearing2 7 2300

Loadmodule

Testedbearing

Dataacquisitionmodule

Rotatingmodule

Figure 6 Global overview of the Pronostia experimental platform[19]

Regarding the data provided for the PHM 2012 challenge3 different operating conditions were considered

(i) first operating conditions speed of 1800 rpm and loadof 4000 Newton

(ii) second operating conditions speed of 1650 rpm andload of 4200 Newton

(iii) third operating conditions speed of 1500 rpm andload of 5000 Newton

Under the above operating conditions a total of 17 accel-erated life tests were realized on bearings of type NSK 6804DD which can operate at a maximum speed of 13000 rpmand a load limit of 4000N The tests were stopped whenthe amplitude of the vibration signal was higher than 20 gthus this moment was defined as the bearing failure time Anexample of bearing before and after the experiment is shownin Figure 7 together with the corresponding vibration signalcollected during the whole test

Table 4 reports how the 17 tested bearings were separatedinto the three operating conditions Moreover the durationof each experiment being the RUL to be predicted for eachbearing is also given We performed two sets of experimentsby considering respectively the bearings relative to thefirst and the second operating condition (ie Bearing1 1Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

As already mentioned the available data correspondto normally degraded bearings meaning that the defects

were not initially induced and that each degraded bearingcontains almost all the types of defects (balls rings and cage)resembling faithfully a common real industrial situationMoreover no assumption about the type of failure to beoccurred is provided with the data and since the variabilityin experiment durations is high (from 1 h to 7 h) performinggood estimates of the RUL is a difficult task [49]

In our experiments we considered as input to our modelthe horizontal channel of the accelerometerWe preprocessedthe raw signals by extracting two time-domain features thatis root mean square (RMS) and kurtosis within windowsof the same length as the given snapshots (119871 = 2560) Let119903119908(119905) be the raw signal of the 119908th window for each 119908 we

estimate RMS as 119909RMS119908

= radic(1119871)sum119871

119905=11199032119908(119905) and kurtosis as

119909KURT119908

= ((1119871)sum119871

119905=1(119903

119908(119905) minus 119903

119908)4)((1119871)sum

119871

119905=1(119903

119908(119905) minus 119903

119908)2)2

where 119903119908is the mean of 119903

119908 An example of feature extraction

for Bearing1 1 is shown in Figure 8To assess the performance of the proposed HSMM after

the model selection procedure we implemented a leave-one-out cross validation scheme by considering separatelyconditions 1 and 2 we performed the online RUL estimationfor each of the 7 bearings using an HSMM trained with theremaining 6 bearing histories Similarly to the simulated casewe considered the average absolute prediction error definedin (64) to quantitatively evaluate our method

522 Bearings RUL Estimation We performed our exper-iments in two steps firstly we applied model selection inorder to determine an optimalmodel structure and secondlywe estimated the RUL of the bearings The full procedure isdetailed in the following

(A) HSMM Structure To determine an appropriate HSMMstructure for effectively modeling the considered data weconsidered several HSMM structures characterized by (i)the duration distribution family (being Gaussian Gamma orWeibull) (ii) an increasing number of states119873 from 2 to 6and (iii) an increasing number of Gaussian mixtures 119872 inthe observation density from 1 to 4 For eachmodel structureobtained by systematically considering all the combinationsof (i) to (iii) we run 120 parameter learnings correspondingto 120 random initializations 1205820 on the data sets (Bearing1 1

Mathematical Problems in Engineering 17

times106

0 1 2 3 4 5 6 7

50

40

30

20

10

0

minus10

minus20

minus30

minus40

minus50

Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]

(b)(a)

0 280300

7

Extracted feature RMS

Time (s)

0 280300

60

Time (s)

Extracted feature kurtosis

Featureextraction

Win

dow

1

Win

dow

2

Win

dow

3

Win

dow

Win

dow

r(t)

x1 x2 x3

middot middot middot

middot middot middot

nminus1

n

xnminus1 xn

Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1

Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density

(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration

Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast

119894were estimated for the 119894th testing bear-

ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878

4as the failure state The same proce-

dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-

ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as

18 Mathematical Problems in Engineering

2 3 4 5 60

005

01

015

02

025

03

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(a) AIC values for Condition 1

2 3 4 5 60

01

02

03

04

05

06

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(b) AIC values for Condition 2

Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density

0 22590

22590

45000

Time (s)

True RULUpper RUL

Average RULLower RUL

(a) RUL estimation for Bearing1 7

True RULUpper RUL

Average RULLower RUL

0 70100

7010

19000

Time (s)

(b) RUL estimation for Bearing2 6

Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings

well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time

Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors

of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2

6 Conclusion and Future Work

In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation

The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required

Mathematical Problems in Engineering 19

Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds

(a) Condition 1

Test Bearings APEavg APElow APEup

Bearing1 1 105716 127230 94146Bearing1 2 43312 38156 38213Bearing1 3 29970 97309 60912Bearing1 4 63363 28766 148719Bearing1 5 19689 74484 104115Bearing1 6 42530 98964 97937Bearing1 7 13880 74943 100881Average 45494 77122 92132

(b) Condition 2

Test Bearings APEavg APElow APEup

Bearing2 1 24759 50065 72875Bearing2 2 16473 44972 82886Bearing2 3 88771 95083 79621Bearing2 4 17698 42486 49825Bearing2 5 86631 104900 107300Bearing2 6 8771 35047 66870Bearing2 7 30125 38664 66519Average 39033 58745 75128

parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology

This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications

As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension

Appendix

In this appendix we give the derivation of the state durationvariable introduced in (20) as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(119883

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A1)

The random variable 119889119905(119894) has been defined in Section 21

as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889

119905(119894) is sampled from an

arbitrary distribution

119889119905(119894) sim 119891 (119889) (A2)

We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as

P (119889119905(119894) = 119889) = P (119904

119905minus119889minus1= 119878

119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905 120582)

(A3)

We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889

119905(119894) of 119889

119905(119894) defined as its

expected value (see Equation (15))

119889119905(119894) = E (119889

119905(119894) | 119904

119905= 119878

119894 x

1x2sdot sdot sdot x

119905) 1 le 119894 le 119873 (A4)

From the definition of expectation we have

119889119905(119894) =

119905

sum

119889=1

119889 sdot P (119889119905(119894) = 119889)

=

119905

sum

119889=1

119889 sdot P (119904119905minus119889minus1

= 119878119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

(A5)

For 119889119905+1

(119894) we have

119889119905+1

(119894) =

119905+1

sum

119889=1

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119886)

(A6)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

(A7)

By noticing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

= (P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1))

sdot (P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1))

minus1

(A8)

20 Mathematical Problems in Engineering

we can replace the probability of the second termof (A7)with

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119887)

(A9)

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

(A10)

In the last factor of (A10) we can omit the information aboutthe current state and observation by observing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

asymp P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894| 119904

119905= 119878

119894 x

1 x

119905)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119888)

(A11)

if the following independencies hold

119904119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

119883119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

(A12)

wherewithperpwe denote independency Equation (A12) holdsforHMMs (evenwithout conditioning on x

1 x

119905) but they

do not hold for HSMMs since the state duration (expressedby 119904

119905minus119889+1 119904

119905minus1) determines the system evolution On

the other hand state duration is partially known by theobservtions x

1 x

119905Thus the approximation is reasonable

as long as the uncertainty on the states is within limitsFrom (A6) (A9) and (A11) we obtain

119889119905+1

(119894) = (119886) +

119905+1

sum

119889=2

119889 sdot (119887) sdot (119888)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

P(119860119861|119862)=P(119860|119861119862)sdotP(119861|119862)

+

119905+1

sum

119889=2

119889 sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

it does not depend on 119889

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

+ P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [

[

P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894119904

119905+1= 119878

119894 x

1 x

119905x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

for the approximation of (A11)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

]

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) +

119905

sum

1198891015840=1

(1198891015840+ 1)

sdot P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

(A13)

Noticing that

119905

sum

1198891015840=1

P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

+ P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) = 1

(A14)

because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows

119889119905+1

(119894) = P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1) sdot (119889

119905(119894) + 1)

(A15)

The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step

In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889

119905+1(119894)

we can consider the following equality

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=P (119904

119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

P (119904119905+1

= 119878119894| x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

120574119905+1

(119894)

(A16)

If we consider the terms involved in the probability at thenumerator of the right-hand side of (A16) we have that

x1 x

119905⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119861

perp x119905+1⏟⏟⏟⏟⏟⏟⏟

119862

| 119904119905= 119878

119894 119904

119905+1= 119878

119894⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119860

(A17)

Mathematical Problems in Engineering 21

If 119861 perp 119862 | 119860 for the Bayes rule we have that

P (119860 | 119862 119861) =P (119862 | 119860119861) sdot P (119860 | 119861)

P (119862 | 119861) (A18)

Hence we can rewrite the numerator of the right-hand sideof (A16) as follows

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

= (P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905)

sdot P(x119905+1

|

x119905+1

perp119904119905|119904119905+1

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞119904119905= 119878

119894 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

= (P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

sdot

120574119905(119894)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (119904

119905= 119878

119894| x

1 x

119905) sdot

119887119894(x119905+1

)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (x

119905+1| 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

(A19)

The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as

P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

= sum

119889119905

119886119894119894(d

119905) sdot P (119889

119905| x

1 x

119905)

asymp 119886119894119894(d

119905)

(A20)

while the denominator of (A19) can be expressed as follows

P (x119905+1

| x1 x

119905) =

P (x1 x

119905 x

119905+1)

P (x1 x

119905)

=sum

119873

119894=1120572119905+1

(119894)

sum119873

119894=1120572119905(119894)

(A21)

By substituting (A20) and (A21) in (A19) we obtain

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

sum119873

119894=1120572119905+1

(119894)

(A22)

and then by combining (A22) and (A16) we obtain

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

120574119905+1

(119894) sum119873

119894=1120572119905+1

(119894)

(A23)

Finally by substituting (A23) in (A15) and considering that

120574119905(119894) =

120572119905(119894)

sum119873

119894=1120572119905(119894)

(A24)

we derive the induction formula for 119889119905+1

(119894) in terms of modelparameters as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(x

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A25)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989

[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004

[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996

[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002

[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005

[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008

[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309

[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en

[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003

[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002

[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven

22 Mathematical Problems in Engineering

methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012

[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008

[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989

[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031

[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010

[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010

[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011

[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012

[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012

[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980

[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk

[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007

[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005

[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003

[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997

[26] S-Z Yu ldquoHidden semi-Markov modelsrdquo Artificial Intelligencevol 174 no 2 pp 215ndash243 2010

[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003

[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006

[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014

[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003

[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008

[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005

[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999

[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007

[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011

[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001

[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002

[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007

[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997

[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008

[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005

[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997

Mathematical Problems in Engineering 23

[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002

[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986

[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993

[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006

[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973

[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977

[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012

[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985

[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015

[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013

[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013

[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013

[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013

[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012

[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012

[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model

and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013

[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Hidden Semi-Markov Models for Predictive Maintenance

2 Mathematical Problems in Engineering

the underlying stochastic process of the state to be a semi-Markov chain instead of a simple Markov chain of a HMMSuch model is referred to as Hidden Semi-Markov Model(HSMM) [21] HSMMs and explicit duration modeles havebeen proven beneficial for many applications [22ndash25] Acomplete overview of different duration model classes hasbeenmade byYu [26]Most state durationmodels used in theliterature are nonparametric discrete distributions [27ndash29]As a consequence the number of parameters that describe themodel and that have to be estimated is high and consequentlythe learning procedure can be computationally expensive forreal complex applications Moreover it is necessary to specifya priori the maximum duration allowed in each state

To alleviate the high dimensionality of the parameterspace parametric duration models have been proposed Forexample Salfner [6] proposed a generic parametric continu-ous distribution to model the state sojourn time However intheir model the observation has been assumed to be discreteand applied to recognize failure-prone observation sequenceUsing continuous observation Azimi et al [30ndash32] specifiedan HSMM with parametric duration distribution belongingto the Gamma family and modeled the observation processby a Gaussian

Inspired by the latter two approaches in this workwe pro-pose a generic specification of a parametric HSMM in whichno constraints are made on the model of the state durationand on the observation processes In our approach the stateduration ismodeled as a generic parametric density functionOn the other hand the observations can be modeled eitheras a discrete stochastic process or as continuous mixtureof Gaussians The latter has been shown to approximatearbitrarily closely any finite continuous density function[33] The proposed model can be generally used in a widerange of applications and types of data Moreover in thispaper we introduce a new and more effective estimator ofthe time spent by the system in a determinate state priorto the current time To the best of our knowledge a partfrom the above referred works the literature on HSMMsapplied to prognosis and predictive maintenance for indus-trial machines is limited [34] Hence the present work aimsto show the effectiveness of the proposed duration model insolving condition monitoring and RUL estimation problems

Dealing with state space models and in particular ofHSMMs one should define the number of states and cor-rect family of duration density and in case of continuousobservations the adequate number of Gaussian mixturesSuch parameters play a prominent role since the right modelconfiguration is essential to enable an accurate modelingof the dynamic pattern and the covariance structure of theobserved time series The estimation of a satisfactory modelconfiguration is referred to asmodel selection in literature

While several state-of-the-art approaches use expertknowledge to get insight on the model structure [15 3536] an automated methodology for model selection is oftenrequired In the literature model selection has been deeplystudied for a wide range of models Among the existingmethodologies information based techniques have beenextensively analyzed in literature with satisfactory results

Although Bayesian Information Criterion (BIC) is particu-larly appropriate to be used in finite mixture models [37 38]Akaike Information Criterion (AIC) has been demonstratedto outperform BIC when applied to more complex modelsand when the sample size is limited [39 40] which is the caseof the target application of this paper

In this work AIC is used to estimate the correct modelconfiguration with the final goal of an automated HSMMsmodel selection which exploits only the information avail-able in the input dataWhile model selection techniques havebeen extensively used in the framework of Hidden MarkovModels [41ndash43] to the best of our knowledge the presentwork is the first that proposes their appliance to durationmodels and in particular to HSMMs

In summary the present work contributes to conditionmonitoring predictive maintenance and RUL estimationproblems by

(i) proposing a general Hidden Semi-Markov Modelapplicable for continuous or discrete observations andwith no constraints on the density function used tomodel the state duration

(ii) proposing a more effective estimator of the stateduration variable 119889

119905(119894) that is the time spent by the

system in the 119894th state prior to current time 119905(iii) adapting the learning inference and prediction algo-

rithms considering the defined HSMM parametersand the proposed 119889

119905(119894) estimator

(iv) using the Akaike Information Criterion for automaticmodel selection

The rest of the paper is organized as follows in Section 2we introduce the theory of the proposed HSMM togetherwith its learning inference and prediction algorithmsSection 3 gives a short theoretical overview of the AkaikeInformation Criterion Section 4 presents the methodologyused to estimate the Remaining Useful Lifetime using theproposed HSMM In Section 5 experimental results arediscussed The conclusion and future research directions aregiven in Section 6

2 Hidden Semi-Markov Models

Hidden Semi-Markov Models (HSMMs) introduce the con-cept of variable duration which results in a more accuratemodeling power if the system being modeled shows a depen-dence on time

In this section we give the specification of the proposedHSMM for which we model the state duration with a para-metric state-dependent distribution Compared to nonpara-metric modeling this approach has two main advantages

(i) the model is specified by a limited number of param-eters as a consequence the learning procedure iscomputationally less expensive

(ii) the model does not require the a priori knowledgeof the maximum sojourn time allowed in each statebeing inherently learnt through the duration distribu-tion parameters

Mathematical Problems in Engineering 3

21 Model Specification A Hidden Semi-Markov Model isa doubly embedded stochastic model with an underlyingstochastic process that is not observable (hidden) but canonly be observed through another set of stochastic processesthat produce the sequence of observations HSMMallows theunderlying process to be a semi-Markov chain with a variableduration or sojourn time for each state The key conceptof HSMMs is that the semi-Markov property holds for thismodel while in HMMs the Markov property implies that thevalue of the hidden state at time 119905 depends exclusively on itsvalue of time 119905 minus 1 in HSMMs the probability of transitionfrom state 119878

119895to state 119878

119894at time 119905depends on the duration spent

in state 119878119895prior to time 119905

In the following we denote the number of states in themodel as119873 the individual states as 119878 = 119878

1 119878

119873 and the

state at time 119905 as 119904119905 The semi-Markov property can be written

as

P (119904119905+1

= 119878119894| 119904

119905= 119878

119895 119904

1= 119878

119896)

= P (119904119905+1

= 119894 | 119904119905= 119895 119889

119905(119895)) 1 le 119894 119895 119896 le 119873

(1)

where the duration variable 119889119905(119895) is defined as the time spent

in state 119878119895prior to time 119905

Although the state duration is inherently discrete inmany studies [44 45] it has been modeled with a continuousparametric density function Similar to the work of Azimiet al [30ndash32] in this paper we use the discrete counterpartof the chosen parametric probability density function (pdf)With this approximation if we denote the pdf of the sojourntime in state 119878

119894as 119891(119909 120579

119894) where 120579

119894represents the set of

parameters of the pdf relative to the 119894th state the probabilitythat the system stays in state 119878

119894for exactly 119889 time steps

can be calculated as int119889

119889minus1119891(119909 120579

119894)119889119909 Considering the HSMM

formulation we can generally denote the state dependentduration distributions by the set of their parameters relativeto each state as Θ = 120579

1 120579

119873

Many related works on HSMMs [31 32 44 45] consider119891(119909 120579

119894) within the exponential family In particular Gamma

distributions are oftenused in speech processing applicationsIn this work we do not impose a type of distribution functionto model the duration The only requirement is that theduration should be modeled as a positive function beingnegative durations physically meaningless

HSMMs require also the definition of a ldquodynamicrdquo tran-sitionmatrix as a consequence of the semi-Markov propertyDifferently from the HMMs in which a constant transitionprobability leads to a geometric distributed state sojourntime HSMMs explicitly define a transition matrix whichdepending on the duration variable has increasing probabil-ities of changing state as the time goes on For conveniencewe specify the state duration variable in a form of a vector d

119905

with dimensions119873 times 1 as

d119905=

119889119905(119895) if 119904

119905= 119878

119895

1 if 119904119905

= 119878119895

(2)

The quantity 119889119905(119895) can be easily calculated by induction from

119889119905minus1

(119895) as

119889119905(119895) = 119904

119905(119895) sdot 119904

119905minus1(119895) sdot 119889

119905minus1(119895) + 1 (3)

where 119904119905(119895) is 1 if 119904

119905= 119878

119895 0 otherwise

If we assume that at time 119905 the system is in state 119878119894 we can

formally define the duration-dependent transition matrix asAd119905

= [119886119894119895(d

119905)] with

119886119894119895(d

119905) = P (119904

119905+1= 119878

119895| 119904

119905= 119878

119894 119889

119905(119894)) 1 le 119894 119895 le 119873 (4)

The specification of themodel can be further simplified byobserving that at each time 119905 the matrix Ad

119905

can be decom-posed in two terms the recurrent and the nonrecurrent statetransition probabilities

The recurrent transition probabilities P(d119905) = [119901

119894119895(d

119905)]

which depend only on the duration vector d119905and the

parameters Θ take into account the dynamics of the self-transition probabilities It is defined as the probability ofremaining in the current state at the next time step given theduration spent in the current state prior to time 119905

119901119894119894(d

119905) = P (119904

119905+1= 119878

119894| 119904

119905= 119878

119894 119889

119905(119894))

= P (119904119905+1

= 119878119894|

119904119905= 119878

119894 119904

119905minus1= 119878

119894 119904

119905minus119889119905(119894)+1

= 119878119894 119904

119905minus119889119905(119894)

= 119878119894)

= (P (119904119905+1

= 119878119894 119904

119905= 119878

119894 119904

119905minus119889119905(119894)+2

= 119878119894|

119904119905minus119889119905(119894)+1

= 119878119894 119904

119905minus119889119905(119894)

= 119878119894))

sdot (P (119904119905= 119878

119894 119904

119905minus1= 119878

119894 119904

119905minus119889119905(119894)+2

= 119878119894|

119904119905minus119889119905(119894)+1

= 119878119894 119904

119905minus119889119905(119894)

= 119878119894))

minus1

(5)

The denominator in (5) can be expressed as suminfin

119896=1P(119904

119905+119896=

119878119894 119904

119905+119896minus1= 119878

119894 119904

119905minus119889119905(119894)+2

= 119878119894| 119904

119905minus119889119905(119894)+1

= 119878119894 119904

119905minus119889119905(119894)

= 119878119894)

which is the probability that the system at time 119905 has beenstaying in state 119878

119894for at least 119889

119905(119894) minus 1 time units The above

expression is equivalent to 1 minus 119865(119889119905(119894) minus 1 120579

119894) where 119865(sdot 120579

119894)

is the duration cumulative distribution function relative tothe the state 119878

119894 that is 119865(119889 120579) = int

119889

minusinfin119891(119909 120579)119889119909 As a

consequence from (5) we can define the recurrent transitionprobabilities as a diagonal matrix with dimensions119873times119873 as

P (d119905) = [119901

119894119895(d

119905)] =

1 minus 119865 (119889119905(119894) 120579

119894)

1 minus 119865 (119889119905(119894) minus 1 120579

119894)

if 119894 = 119895

0 if 119894 = 119895

(6)

The usage of the cumulative functions in (6) which tendto 1 as the duration tends to infinity suggests that theprobability of self-transition tends to decrease as the sojourntime increases leading the model to always leave the currentstate if time approaches infinity

The nonrecurrent state transition probabilities A0=

[1198860

119894119895] rule the transitions between two different states It is

4 Mathematical Problems in Engineering

represented by a 119873 times 119873 matrix with the diagonal elementsequal to zero defined as

A0= [119886

0

119894119895] =

0 if 119894 = 119895

P (119904119905+1

= 119878119895| 119904

119905= 119878

119894) if 119894 = 119895

(7)

A0 must be specified as a stochastic matrix that is its ele-ments have to satisfy the constraintsum119873

119895=11198860

119894119895= 1 for all 119894

As a consequence of the above decomposition thedynamic of the underlying semi-Markov chain can be definedby specifying only the state-dependent duration parametersΘ and the nonrecurrentmatrixA0 since themodel transitionmatrix can be calculated at each time 119905 using (6) and (7)

Ad119905

= P (d119905) + (I minus P (d

119905))A0

(8)

where I is the identity matrix If we denote the elements ofthe dynamic transition matrix Ad

119905

as 119886119894119895(d

119905) the stochastic

constraint sum119873

119895=1119886119894119895(d

119905) = 1 for all 119894 and 119905 is guaranteed from

the fact that P(d119905) is a diagonal matrix and A0 is a stochastic

matrixFor several applications it is necessary to model the

absorbing state which in the case of industrial equipmentcorresponds to the ldquobrokenrdquo or ldquofailurerdquo state If we denote theabsorbing state as 119878

119896with 119896 isin [1119873] we must fix the 119896th row

of the nonrecurrentmatrixA0 to be 1198860119896119896= 1 and 1198860

119896119894= 0 for all

1 le 119894 le 119873 with 119894 = 119896 By substituting such A0 matrix in (8)it is easy to show that the element 119886

119896119896(d

119905) = 1 and remains

constant for all 119905 while the duration probability parameters120579119896are not influent for the absorbing state 119878

119896 An example of

absorbing state specification will be given in Section 5With respect to the input observation signals in this work

we consider both continuous and discrete data by adaptingthe suitable observationmodel depending on the observationnature In particular for the continuous case we modelthe observations with a multivariate mixture of Gaussiansdistributions This choice presents two main advantages (i)a multivariate model allows to deal with multiple observa-tions at the same time this is often the case of industrialequipments modeling since at each time multiple sensorsrsquomeasurements are available and (ii) mixture of Gaussians hasbeen proved to closely approximate any finite and continuousdensity function [33] Formally if we denote by x

119905the

observation vector at time 119905 and the generic observationvector beingmodeled as x the observation density for the 119895thstate is represented by a finite mixture of119872 gaussians

119887119895(x) =

119872

sum

119898=1

119888119895119898N (x120583

119895119898U

119895119898) 1 le 119895 le 119873 (9)

where 119888119895119898

is the mixture coefficient for the 119898th mixture instate 119878

119895 which satisfies the stochastic constraintsum119872

119898=1119888119895119898

= 1

for 1 le 119895 le 119873 and 119888119895119898

ge 0 for 1 le 119895 le 119873 and 1 le 119898 le 119872while N is the Gaussian density with mean vector 120583

119895119898and

covariance matrix U119895119898

for the 119898th mixture component instate 119895

In case of discrete data we model the observationswithin each state with a nonparametric discrete probability

distribution In particular if 119871 is the number of distinctobservation symbols per state and if we denote the symbolsas 119883 = 119883

1 119883

119871 and the observation at time 119905 as 119909

119905 the

observation symbol probability distribution can be defined asa matrix 119861 = [119887

119895(119897)] of dimensions119873 times 119871 where

119887119895(119897) = P [119909

119905= 119883

119897| 119904

119905= 119878

119895] 1 le 119895 le 119873 1 le 119897 le 119871 (10)

Since the system in each state at each time step can emit oneof the possible 119871 symbols the matrix 119861 is stochastic that is itis constrained to sum119871

119897=1119887119895(119897) = 1 for all 1 le 119895 le 119873

Finally as in the case of HMMs we specify the initial statedistribution 120587 = 120587

119894 which defines the probability of the

starting state as

120587119894= P [119904

1= 119878

119894] 1 le 119894 le 119873 (11)

From the above considerations two different HSMMmodels can be considered In the case of continuous obser-vation 120582 = (A0

Θ 119862 120583 119880 120587) and in the case of discreteobservation the HSMM is characterized by 120582 = (A0

Θ 119861 120587)An example of continuous HSMM with 3 states is shown inFigure 1

22 Learning and Inference Algorithms Let us denote thegeneric sequence of observations being indiscriminatelycontinuous vectors or discrete symbols as x = x

1x2sdot sdot sdot x

119879 in

order to use the defined HSMM model in practice similarlyto the HMM we need to solve three basic problems

(1) Given the observation x and a model 120582 calculate theprobability that the sequence x has been generated bythe model 120582 that is P(x | 120582)

(2) Given the observation x and a model 120582 calculatethe state sequence 119878 = 119904

11199042sdot sdot sdot 119904

119879which have most

probably generated the sequence x(3) Given the observation x find the parameters of the

model 120582 which maximize P(x | 120582)

As in case of HMM solving the above problems requiresusing the forward-backward [13] decoding (Viterbi [46] andForney [47]) and ExpectationMaximization [48] algorithmswhichwill be adapted to theHSMMintroduced in Section 21In the following we also propose a more effective estimatorof the state duration variable 119889

119905(119894) defined in (2)

221 The Forward-Backward Algorithm Given a genericsequence of observations x = x

1x2sdot sdot sdot x

119879 the goal is to

calculate the model likelihood that isP(x | 120582) This quantityis useful for the training procedure where the parametersthat locally maximize the model likelihood are chosen aswell as for classification problems The latter is the case inwhich the observation sequence x has to be mapped to oneof a finite set of 119862 classes represented by a set of HSMMparameters 119871 = 120582

1 120582

119862 The class of x is chosen such

that 120582(x) = argmax120582isin119871

P(119883 | 120582)To calculate the model likelihood we first define the

forward variable at each time 119905 as

120572119905(119894) = P (x

1x2sdot sdot sdot x

119905 119904

119905= 119878

119894| 120582) 1 le 119894 le 119873 (12)

Mathematical Problems in Engineering 5

Hiddenstates

Observationprobabilities

Sojournprobabilities

Time (u) Time (u) Time (u)

Observed Observed Observed

S1 S2 S3

a12 a23

P(oS1) P(oS2) P(oS3)

d3(u)d2(u)d1(u)

Figure 1 Graphical representation of an HSMM

Contrarily toHMMs forHSMMs the state durationmustbe taken into account in the the forward variable calculationConsequently Yu [26] proposed the following inductiveformula

120572119905(119895 119889) =

119863

sum

1198891015840=1

119873

sum

119894=1

(120572119905minus1198891015840 (119894 119889

1015840) 119886

0

119894119895119901119895119895(119889

1015840)

119905

prod

119896=119905minus119889+1

119887119895(x

119896))

1 le 119895 le 119873 1 le 119905 le 119879

(13)

that is the sum of the probabilities of being in the currentstate 119878

119895(at time 119905) for the past 1198891015840 time units (with 1 le 119889

1015840le 119863

and119863 themaximumallowed duration for each state) comingfrom all the possible previous states 119878

119894 1 le 119894 le 119873 and 119894 = 119895

The disadvantage of the above formulation is that asdiscussed in Introduction the specification of the maximumduration 119863 represents a limitation to the modeling general-ization Moreover from (13) it is clear that the computationand memory complexities drastically increase with119863 whichcan be very large inmany applications in particular for onlinefailure prediction

To alleviate this problem Azimi et al [30ndash32] introduceda new forward algorithm for HSMMs that by keeping trackof the estimated average state duration at each iterationhas a computational complexity comparable to the forwardalgorithm for HMMs [13] However the average state dura-tion represents an approximation Consequently the forwardalgorithm of Azimi compared with (13) pays the priceof a lower precision in favor of a (indispensable) bettercomputational efficiency

To calculate the forward variable 120572119905(119895) using Azimirsquos

approach the duration-dependent transition matrix defined

in (8) is taken in consideration in the induction formula of(13) which becomes [30]

120572119905(119895) = [

119873

sum

119894=1

120572119905minus1

(119894) 119886119894119895(d

119905minus1)] 119887

119895(x

119905) (14)

To calculate the above formula the average state duration of(2)must be estimated for each time 119905 bymeans of the variabled119905= [119889

119905(119894)] defined as

119889119905(119894) = E (119889

119905(119894) | x

1x2sdot sdot sdot x

119905 119904

119905= 119878

119894 120582) 1 le 119894 le 119873 (15)

where E denotes the expected value To calculate the abovequantity Azimi et al [30ndash32] use the following formula

d119905= 120574

119905minus1⊙ d

119905minus1+ 1 (16)

where ⊙ represents the element by element product betweentwo matricesvectors and the vector 120574t = [120574

119905(119894)] (the

probability of being in state 119878119894at time 119905 given the observation

sequence and the model parameters) with dimensions119873 times 1

is calculated in terms of 120572119905(119894) as

120574119905(119894) = P (119904

119905= 119878

119894| x

1x2sdot sdot sdot x

119905 120582) =

120572119905(119894)

sum119873

119895=1120572119905(119895)

1 le 119894 le 119873

(17)

Equation (16) is based on the following induction formula[30ndash32] that rules the dynamics of the duration vector whenthe systemrsquos state is known

119889119905(119894) = 119904

119905minus1(119894) sdot 119889

119905minus1(119894) + 1 (18)

where for each 119905 119904119905(119894) is 1 if 119904

119905= 119878

119894 0 otherwise

6 Mathematical Problems in Engineering

A simple example shows that (18) is incorrect assum-ing an HSMM with three states and considering the statesequence (119878

1 119878

1 119878

2 ) the correct sequence of the duration

vector is d1= [1 1 1]

119879 d2= [2 1 1]

119879 and d3= [1 1 1]

119879where the superscript 119879 denotes vector transpose If we apply(18) we obtain d

1= [1 1 1]

119879 d2= [2 1 1]

119879 and d3=

[1 2 1]119879 which is in contradiction with the definition of the

state duration vector given in (2)To calculate the average state duration variable 119889

119905(119894) we

propose a new induction formula that estimates for each time119905 the time spent in the 119894th state prior to 119905 as

119889119905(119894) = P (119904

119905minus1= 119878

119894| 119904

119905= 119878

119894 x

1 x

119905) sdot (119889

119905minus1(119894) + 1) (19)

=119886119894119894(d

119905minus1) sdot 120572

119905minus1(119894) sdot 119887

119894(x

119905)

120572119905(119894)

sdot (119889119905minus1

(119894) + 1)

1 le 119894 le 119873

(20)

The derivation of (20) is given in Appendix The intuitionbehind (19) is that the current average duration is the previousaverage duration plus one weighted with the ldquoamountrdquo of thecurrent state that was already in state 119878

119894in the previous step

Using the proposed (20) the forward algorithm can bespecified as follows

(1) initialization with 1 le 119894 le 119873

1205721(119894) = 120587

119894119887119894(x

1)

1198891(119894) = 1

Ad1

= P (d1) + (I minus P (d

1))A0

(21)

where P(d119894) is estimated using (6)

(2) induction with 1 le 119895 le 119873 and 1 le 119905 le 119879 minus 1

120572119905+1

(119895) = [

119873

sum

119894=1

120572119905(119894) 119886

119894119895(d

119905)] 119887

119895(x

119905+1) (22)

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(x

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (23)

Ad119905+1

= P (d119905+1

) + (I minus P (d119905+1

))A0 (24)

where 119886119894119895(d

119905) are the coefficients of the matrix Ad

119905

(3) termination

P (x | 120582) =119873

sum

119894=1

120572119879(119894) (25)

Similar considerations as the forward procedure can bemade for the backward algorithm which is implemented bydefining the variable 120573

119905(119894) as

120573119905(119894) = P (x

119905+1x119905+2

sdot sdot sdot x119879| 119904

119905= 119878

119894 120582) 1 le 119894 le 119873 (26)

Having estimated the dynamic transition matrix Ad119905

foreach 1 le 119905 le 119879 using (24) the backward variable can becalculated inductively as follows

(1) Initialization

120573119879(119894) = 1 1 le 119894 le 119873 (27)

(2) Induction

120573119905(119894) =

119873

sum

119895=1

119886119894119895(d

119905) 119887

119895(x

119905+1) 120573

119905+1(119895)

119905 = 119879 minus 1 119879 minus 2 1 1 le 119894 le 119873

(28)

Although the variable 120573119905(119894) is not necessary for the

calculation of the model likelihood it will be useful in theparameter reestimation procedure as it will be explained inSection 223

222 The Viterbi Algorithm The Viterbi algorithm [46 47](also known as decoding) allows determining the best statesequence corresponding to a given observation sequence

Formally given a sequence of observation x = x1x2sdot sdot sdot x

119879

the best state sequence 119878lowast = 119904lowast

1119904lowast

2sdot sdot sdot 119904

lowast

119879corresponding to x is

calculated by defining the variable 120575119905(119894) as

120575119905(119894) = max

11990411199042119904119905minus1

P (11990411199042 119904

119905= 119878

119894 x

1x2sdot sdot sdot x

119905| 120582) (29)

The procedure to recursively calculate the variable 120575119905(119894)

and to retrieve the target state sequence (ie the argumentswhich maximize the 120575

119905(119894)rsquos) for the proposed HSMM is a

straightforward extension of theViterbi algorithm forHMMs[13]The only change is the usage in the recursive calculationof 120575

119905(119894) of the dynamic transition matrix Ad

119905

= [119886119894119895(d

119905)]

calculated through (24) The Viterbi algorithm for the intro-duced parametric HSMMs can be summarized as follows

(1) initialization with 1 le 119894 le 119873

1205751(119894) = 120587

119894119887119894(x

1)

1205951(119894) = 0

(30)

(2) recursion with 1 le 119895 le 119873 and 2 le 119905 le 119879

120575119905(119895) = max

1le119894le119873

[120575119905minus1

(119894) 119886119894119895(d

119905)] 119887

119895(x

119905) (31)

120595119905(119895) = argmax

1le119894le119873

[120575119905minus1

(119894) 119886119894119895(d

119905)] (32)

(3) termination

119875lowast= max

1le119894le119873

[120575119879(119894)] (33)

119904lowast

119879= argmax

1le119894le119873

[120575119879(119894)] (34)

where we keep track of the argument maximizing (31) usingthe vector 120595

119905 which tracked back gives the desired best

state sequence

119904lowast

119905= 120595

119905+1(119904

lowast

119905+1) 119905 = 119879 minus 1 119879 minus 2 1 (35)

Mathematical Problems in Engineering 7

223 The Training Algorithm The training algorithm con-sists of estimating themodel parameters from the observationdata As discussed in Section 21 a parametric HSMMis defined by 120582 = (A0

Θ 119862 120583 119880 120587) if the observationsare continuous or 120582 = (A0

Θ 119861 120587) if the observationsare discrete Given a generic observation sequence x =

x1x2sdot sdot sdot x

119879 referred to as training set in the following the

training procedure consists of finding the model parameterset120582lowast which locallymaximizes themodel likelihoodP(x | 120582)

We use the modified Baum-Welch algorithm of Azimiet al [30ndash32] However in our implementation we do notmake assumption on the density function used to model thestate duration and we consider both continuous and discreteobservations

Being a variant of the more general Expectation-Max-imization (EM) algorithm Baum-Welch is an iterative proce-dure which consists of two steps (i) the expectation step inwhich the forward and backward variables are calculated andthe model likelihood is estimated and (ii) the maximizationstep in which the model parameters are updated and usedin the next iteration This process usually starts from arandom guess of the model parameters 1205820 and it is iterateduntil the likelihood function does not improve between twoconsecutive iterations

Similarly to HMMs the reestimation formulas arederived by firstly introducing the variable 120585

119905(119894 119895) which

represents the probability of being in state 119878119894at time 119905 and

in state 119878119895at time 119905 + 1 given the model and the observation

sequence as

120585119905(119894 119895) = P (119904

119905= 119878

119894 119904

119905+1= 119878

119895| x 120582) (36)

However in the HSMM case the variable 120585119905(119894 119895) considers

the duration estimation performed in the forward algorithm(see Equation (24)) Formulated in terms of the forward andbackward variables it is given by

120585119905(119894 119895) = P (119904

119905= 119878

119894 119904

119905+1= 119878

119895| x 120582)

=P (119904

119905= 119878

119894 119904

119905+1= 119878

119895 x | 120582)

P (x | 120582)

=120572119905(119894) 119886

119894119895(d

119905) 119887

119895(x

119905+1) 120573

119905+1(119895)

P (x | 120582)

=120572119905(119894) 119886

119894119895(d

119905) 119887

119895(x

119905+1) 120573

119905+1(119895)

sum119873

119894=1sum

119873

119895=1120572119905(119894) 119886

119894119895(d

119905) 119887

119895(x

119905+1) 120573

119905+1(119895)

(37)

From 120585119905(119894 119895) we can derive the quantity 120574

119905(119894) (already

defined in (17)) representing the probability of being in state119878119894at time 119905 given the observation sequence and the model

parameters

120574119905(119894) =

119873

sum

119895=1

120585119905(119894 119895) (38)

Finally the the reestimation formulas for the parameters120587 and A0 are given by

120587119894= 120574

1(119894) (39)

1198860

119894119895=

(sum119879minus1

119905=1120585119905(119894 119895)) ⊙ 119866

sum119873

119895=1(sum

119879minus1

119905=1120585119905(119894 119895)) ⊙ 119866

(40)

where119866 = [119892119894119895] is a squarematrix of dimensions119873times119873where

119892119894119895= 0 for 119894 = 119895 and 119892

119894119895= 1 for 119894 = 119895 ⊙ represents the element

by element product between two matrices sum119879minus1

119905=1120574119905(119894) is the

expected number of transitions from state 119878119894 andsum119879minus1

119905=1120585119905(119894 119895)

is the expected number of transitions from state 119878119894to state 119878

119895

Equation (39) represents the expected number of timesthat the model starts in state 119878

119894 while (40) represents the

expected number of transitions from state 119878119894to state 119878

119895with

119894 = 119895 over the total expected number of transitions from state119878119894to any other state different from 119878

119894

For the matrix A0 being normalized the stochasticconstraints are satisfied at each iteration that is sum119873

119895=11198860

119894119895= 1

for each 1 le 119894 le 119873 while the estimation of the priorprobability 120587

119894inherently sums up to 1 at each iteration since

it represents the expected frequency in state 119878119894at time 119905 = 1

for each 1 le 119894 le 119873With respect to the reestimation of the state duration

parameters Θ firstly we estimate the mean 120583119894119889

and thevariance 1205902

119894119889of the 119894th state duration for each 1 le 119894 le 119873

from the forward and backward variables and the estimationof the state duration variable

120583119894119889

=sum

119879minus1

119905=1120572119905(119894) (sum

119873

119895=1119895 =119894119886119894119895(119889

119905(119894)) 119887

119895(x

119905+1) 120573

119905+1(119895)) 119889

119905(119894)

sum119879minus1

119905=1120572119905(119894) (sum

119873

119895=1119895 =119894119886119894119895(119889

119905(119894)) 119887

119895(x

119905+1) 120573

119905+1(119895))

(41)

1205902

119894119889= (

119879minus1

sum

119905=1

120572119905(119894)(

119873

sum

119895=1119895 =119894

119886119894119895(119889

119905(119894)) 119887

119895(x

119905+1) 120573

119905+1(119895))

sdot (119889119905(119894) minus 120583

119894119889)2

)

sdot (

119879minus1

sum

119905=1

120572119905(119894)(

119873

sum

119895=1119895 =119894

119886119894119895(119889

119905(119894)) 119887

119895(x

119905+1) 120573

119905+1(119895)))

minus1

(42)

where (41) can be interpreted as the probability of transitionfrom state 119878

119894to 119878

119895with 119894 = 119895 at time 119905weighted by the duration

of state 119878119894at 119905 giving the desired expected value while in (42)

the same quantity is weighted by the squared distance of theduration at time 119905 from its mean giving the estimation of thevariance

Then the parameters of the desired duration distributioncan be estimated from 120583

119894119889and 1205902

119894119889 For example if a Gamma

distribution with shape parameter ] and scale parameter 120578 ischosen to model the state duration the parameters ]

119894and 120578

119894

for each 1 le 119894 le 119873 can be calculated as ]119894= 120583

2

119894119889120590

2

119894119889and

120578119894= 120590

2

119894119889120583

119894119889

8 Mathematical Problems in Engineering

Concerning the observation parameters once the modi-fied forward and backward variables accounting for the stateduration are defined as in (22) and (28) the reestimationformulas are the same as for Hidden Markov Models [13]

In particular for continuous observations the parametersof the Gaussiansrsquo mixture defined in (9) are reestimated byfirstly defining the probability of being in state 119878

119895at time 119905

with the probability of the observation vector x119905evaluated by

the 119896th mixture component as

120574119905(119895 119896) = [

[

120572119905(119895) 120573

119905(119895)

sum119873

119895=1120572119905(119895) 120573

119905(119895)

]

]

sdot [

[

119888119895119896N (x

119905120583

119895119896U

119895119896)

sum119872

119898=1119888119895119898N (x

119905120583

119895119898U

119895119898)

]

]

(43)

By using the former quantity the parameters 119888119895119896 120583

119895119896 andU

119895119896

are reestimated through the following formulas

119888119895119896=

sum119879

119905=1120574119905(119895 119896)

sum119879

119905=1sum

119872

119898=1120574119905(119895 119898)

120583119895119896=sum

119879

119905=1120574119905(119895 119896) sdot x

119905

sum119879

119905=1120574119905(119895 119896)

U119895119896=sum

119879

119905=1120574119905(119895 119896) sdot (x

119905minus 120583

119895119896) (x

119905minus 120583

119895119896)119879

sum119879

119905=1120574119905(119895 119896)

(44)

where superscript 119879 denotes vector transposeFor discrete observations the reestimation formula for

the observation matrix 119887119895(119897) is

119887119895(119897) =

sum119879

119905=1with119909119905=119883119897

120574119905(119895)

sum119879

119905=1120574119905(119895)

(45)

where the quantity 120574119905(119895) which takes into account the dura-

tion dependent forward variable 120572119905(119895) is calculated through

(17)The reader is referred to Rabinerrsquos work [13] for the inter-

pretation on the observation parameters reestimation formu-las

3 AIC-Based Model Selection

In the framework of the proposed parametric HSMMs themodel selection procedure aims to select the optimal numberof hidden states119873 the right duration distribution family andin the case of mixture observation modeling the number ofGaussian mixtures 119872 to be used In this work we make useof the Akaike Information Criterion (AIC) Indeed it hasbeen seen that in case of complex models and in presence ofa limited number of training observations AIC represents asatisfactorymethodology formodel selection outperformingother approaches like Bayesian Information Criterion

In general information criteria are represented as a two-term structure They account for a compromise between

a measure of model fitness which is based on the likelihoodof the model and a penalty term which takes into accountthe model complexity Usually the model complexity ismeasured in terms of the number of parameters that have tobe estimated and in terms of the number of observations

The Akaike Information Criterion is an estimate of theasymptotic value of the expected distance between theunknown true likelihood function of the data and the fittedlikelihood function of the model In particular the AIC canbe expressed as

AIC =minus log 119871 () + 119901

119879 (46)

where 119871() is the likelihood of the model with the estimatedparameters as defined in (25) 119901 is the number of modelparameters and119879 is the length of the observed sequenceThebest model is the one minimizing equation (46)

Concerning 119901 the number of parameters to be estimatedfor a parametric HSMM with119873 states is 119901 = 119901

ℎ+ 119901

119900 where

119901ℎare the parameters of the hidden states layer while 119901

119900are

those of the observation layerIn particular 119901

ℎ= (119873 minus 1) + (119873 minus 1) sdot 119873 + 119911 sdot 119873 where

(i) 119873 minus 1 accounts for the prior probabilities 120587(ii) (119873 minus 1) sdot 119873 accounts for the nonrecurrent transition

matrix A0(iii) 119911sdot119873 accounts for the duration probability being 119911 the

number of parameters 120579 of the duration distribution

Concerning 119901119900 a distinction must be made between

discrete and continuous observations

(i) in the case of discrete observations with 119871 possibleobservable values 119901

119900= (119871 minus 1) sdot 119873 which accounts

for the elements of the observation matrix 119861(ii) if the observations are continuous and a multivariate

mixture of 119872 Gaussians with 119874 variates is used asobservationmodel 119901

119900= [119874 sdot119873sdot119872]+[119874 sdot119874 sdot119873sdot119872]+

[(119872minus 1) sdot 119873] where each term accounts respectivelyfor the mean vector 120583 the covariance matrix 119880 andthe mixture coefficients 119862

4 Remaining Useful Lifetime Estimation

One of the most important advantages of the time modelingof HSMMs is the possibility to effectively face the predictionproblem The knowledge of the state duration distributionsallows the estimation of the remaining time in a certain stateand in general the prediction of the expected time119863 beforeentering in a determinate state

As already mentioned an interesting application of theprediction problem is the Remaining Useful Lifetime (RUL)estimation of industrial equipments Indeed if each stateof an HSMM is mapped to a different condition of anindustrial machine and if the state 119878

119896that represents the

failure condition is identified at each moment the RUL canbe defined as the expected time 119863 to reach the failure state

Mathematical Problems in Engineering 9

119878119896 If we assume that the time to failure is a random variable

119863 following a determinate probability density we define theRUL at the current time 119905 as

RUL119905= 119863 = E (119863) 119904

119905+= 119878

119896 119904

119905+minus1= 119878

119894

1 le 119894 119896 le 119873 119894 = 119896

(47)

where E denotes the expected valueHaving fixed the failure state the estimation of the RUL

is performed in two steps every time a new observation isacquired (online)

(1) estimation of the current state(2) projection of the future state transitions until the

failure state is reached and estimation of the expectedsojourn time

The estimation of the current state is performed via theViterbi path that is the variable 120575

119905= [120575

119905(119894)]

1le119894le119873defined in

(29) To correctly model the uncertainty of the current stateestimation we use the normalized variable 120575

119905(119894) obtained as

120575119905(119894) = max

11990411199042119904119905minus1

P (119904119905= 119878

119894| 119904

11199042sdot sdot sdot 119904

119905minus1 x

1x2sdot sdot sdot x

119905 120582)

=120575119905(119894)

sum119873

119895=1120575119905(119895)

1 le 119894 le 119873

(48)

that is an estimate of the probability of being in state 119878119894at

time 119905Together with the normalized variable 120575

119905(119894) the maxi-

mum a posteriori estimate of the current state 119904lowast

119905is taken

into account according to (34) If 119904lowast119905coincides with the failure

state the desired event is detected by the model and the timeto this event is obviously zero Otherwise an estimation ofthe average remaining time in the current state 119889avg(119904

lowast

119905) is

calculated as

119889avg (119904lowast

119905) =

119873

sum

119894=1

(120583119889119894

minus 119889119905(119894)) ⊙ 120575

119905(119894) (49)

where with 120583119889119894

we denote the expected value of the durationvariable in state 119878

119894according to the duration distribution

specified by the parameters 120579119894 Equation (49) thus estimates

the remaining time in the current state by subtracting theestimated states duration 119889

119905(119894) at time 119905 from the expected

sojourn time of state 119878119894 and weighting the result using the

uncertainty about the current state 120575119905(119894) and finally by

summing up all the contributions from each stateIn addition to the average remaining time a lower and an

upper bound value can be calculated based on the standarddeviation 120590

119889119894

of the duration distribution for state 119878119894

119889low (119904lowast

119905) =

119873

sum

119894=1

(120583119889119894

minus 120590119889119894

minus 119889119905(119894)) ⊙ 120575

119905(119894) (50)

119889up (119904lowast

119905) =

119873

sum

119894=1

(120583119889119894

+ 120590119889119894

minus 119889119905(119894)) ⊙ 120575

119905(119894) (51)

Once the remaining time in the current state is estimatedthe probability of the next state is calculated by multiplyingthe transpose of the nonrecurrent transition matrix by thecurrent state probability estimation as follows

120575next = [120575119905+119889

(119894)]1le119894le119873

= (A0)119879

sdot 120575119905 (52)

while themaximumaposteriori estimate of the next state 119904lowastnextis calculated as

119904lowast

next = 119904lowast

119905+119889= argmax

1le119894le119873

120575119905+119889

(119894) (53)

Again if 119904lowast119905+119889

coincides with the failure state the failurewill happen after the remaining time at the current state isover and the average estimation of the failure time is 119863avg =

119889avg(119904lowast

119905) calculated at the previous step with the bound values

119863low = 119889low(119904lowast

119905) and119863up = 119889up(119904

lowast

119905) Otherwise the estimation

of the sojourn time of the next state is calculated as follows

119889avg (119904lowast

119905+119889) =

119873

sum

119894=1

120583119889119894

⊙ 120575119905+119889

(119894) (54)

119889low (119904lowast

119905+119889) =

119873

sum

119894=1

(120583119889119894

minus 120590119889119894

) ⊙ 120575119905+119889

(119894) (55)

119889up (119904lowast

119905+119889) =

119873

sum

119894=1

(120583119889119894

+ 120590119889119894

) ⊙ 120575119905+119889

(119894) (56)

This procedure is repeated until the failure state isencountered in the prediction of the next state The calcula-tion of the RUL is then simply obtained by summing all theestimated remaining time in each intermediate state beforeencountering the failure state

119863avg = sum119889avg (57)

119863low = sum119889low (58)

119863up = sum119889up (59)

Finally Algorithm 1 details the above described RUL esti-mation procedure

5 Experimental Results

To demonstrate the effectiveness of the proposed HSMMmodels we make use of a series of experiments performedboth on simulated and real data

The simulated data were generated by considering a left-right HSMM and adapting the parameters of the artificialexample reported in the work of Lee et al [15] The real casedata are monitoring data related to the entire operationallife of bearings made available for the IEEE PHM 2012 datachallenge (httpwwwfemto-stfrenResearch-departmentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-chal-lengephp)

10 Mathematical Problems in Engineering

(1) function RulEstimation(x119905 119878

119896) ⊳ x

119905 The last observation acquired

(2) ⊳ 119878119896 The failure state

(3) Initialization(4) 119863avg larr 0

(5) 119863low larr 0

(6) 119863up larr 0

(7) Current state estimation(8) Calculate 120575

119905⊳ Using (48)

(9) Calculate 119904lowast119905

⊳ Using (34)(10) Calculate d

119905⊳ Using (20)

(11) 119878 larr 119904lowast

119905

(12) Loop(13) while 119878 = 119878

119896do

(14) Calculate 119889avg ⊳ Using (49) or (54)(15) Calculate 119889low ⊳ Using (50) or (55)(16) Calculate 119889up ⊳ Using (51) or (56)(17) 119863avg larr 119863avg + 119889avg

(18) 119863low larr 119863low + 119889low(19) 119863up larr 119863up + 119889up(20) Calculate 120575next ⊳ Using (52)(21) Calculate 119904lowastnext ⊳ Using (53)(22) 119878 larr 119904

lowast

nextend

(23) return 119863avg119863low119863up

Algorithm 1 Remaining Useful Lifetime estimation (Pseudo-Code)

51 Simulated Experiment Data have been generated withthe idea of simulating the behavior of an industrial machinethat during its functioning experiences several degradationmodalities until a failure state is reached at the end of itslifetimeThe generated data are used to test the performancesof our methodology for (i) automatic model selection (ii)online condition monitoring and (iii) Remaining UsefulLifetime estimation considering both continuous and dis-crete observations

511 Data Generation The industrial machine subject ofthese experiments has been modeled as a left-right paramet-ric HSMM with 119873 = 5 states having state 119878

5as absorbing

(failure) stateThe choice of a left-right setting has beenmadefor simplicity reasons since the primary goal of this work is todemonstrate that the proposed model specification coupledwith the Akaike Information Criterion is effective to solveautomatic model selection online condition monitoringand prediction problems At this purpose we divided theexperiments in two cases according to the nature of theobservation being continuous or discrete

For each of the continuous and the discrete cases threedata sets have been generated by considering the followingduration models Gaussian Gamma and Weibull densitiesFor each of the three data sets 30 series of data are usedas training set and 10 series as testing set Each time seriescontains 119879 = 650 observations The parameters used togenerate the data are taken from the work of Lee et al [15]

and are adapted to obtain an equivalent left-right parametricHSMM as follows

120587 =

[[[[[

[

1

0

0

0

0

]]]]]

]

A0=

[[[[[

[

0 1 0 0 0

0 0 1 0 0

0 0 0 1 0

0 0 0 0 1

0 0 0 0 1

]]]]]

]

ΘN = 1205791= [100 20] 120579

2= [90 15] 120579

3= [100 20]

1205794= [80 25] 120579

5= [200 1]

ΘG = 1205791= [500 02] 120579

2= [540 01667]

1205793= [500 02] 120579

4= [256 03125]

1205795= [800 0005]

ΘW = 1205791= [102 28] 120579

2= [92 29] 120579

3= [102 28]

1205794= [82 20] 120579

5= [200 256]

(60)

whereΘNΘG andΘW are the different distribution param-eters used for the Gaussian Gamma and Weibull dura-tion models respectively More precisely they represent thevalues of the mean 120583

119889and the the variance 120590

2

119889of the

Gaussian distribution the shape ]119889and the scale 120578

119889of the

Gamma distribution and the scale 119886119889and the shape 119887

119889of

Mathematical Problems in Engineering 11

0 100 200 300 400 500 600

12345

Hidden states sequence

0 100 200 300 400 500 6000

100

200

300 State duration

0 100 200 300 400 500 6000

20

40

60Observed signal

(a) Example of simulated data for the continuous case

0 100 200 300 400 500 600

12345

Hidden states sequence

0 100 200 300 400 500 6000

100

200

300 State duration

0 100 200 300 400 500 600

132

54

76

Observed symbols

(b) Example of simulated data for the discrete case

Figure 2 The data generated with the parameter described in Section 511 both for the continuous and the discrete case

theWeibull distribution It must be noticed that as explainedin Section 21 for state 119878

5 being the absorbing state the

duration parameters 1205795have no influence on the data since

once the state 1198785is reached the system will remain there

foreverConcerning the continuous observation modeling a

bivariate Gaussian distribution has been used with the fol-lowing parameters [15]

1205831= [

20

20] 120583

2= [

20

35] 120583

3= [

35

35]

1205835= [

28

28]

1198801= [

20 0

0 20] 119880

2= [

15 0

0 15] 119880

3= [

15 minus2

minus2 15]

1198804= [

5 0

0 5] 119880

5= [

10 3

3 10]

(61)

while for the discrete case 119871 = 7 distinct observation sym-bols have been taken into consideration with the followingobservation probability distribution

119861 =

[[[[[

[

08 02 0 0 0 0 0

01 08 01 0 0 0 0

0 01 08 01 0 0 0

0 0 01 07 01 01 0

0 0 0 02 06 01 01

]]]]]

]

(62)

An example of simulated data both for the continuous and thediscrete cases is shown in Figure 2 where aGaussian durationmodel has been used

512 Training and Model Selection The goal of this experi-mental phase is to test the effectiveness of the AIC in solvingthe automatic model selection For this purpose the trainingsets of the 6 data sets (continuousdiscrte observation withGaussian Gamma and Weibull duration models) have beentaken individually and for each one of them a series of

learning procedure has been run each one with a variableHSMM structure In particular we took into account all thecombinations of the duration distribution families (GaussianGamma and Weibull) an increasing number of states 119873from 2 to 8 and for the continuous observation cases anincreasing number of Gaussian mixtures119872 in the observa-tion distribution from 1 to 4

As accurate parameter initialization is crucial for obtain-ing a good model fitting [14] a series of 40 learning proce-dures corresponding to 40 random initializations of the initialparameters 1205820 have been executed for each of the consideredHSMM structures For each model structure the AIC valueas defined in (46) has been evaluatedThe final trained set ofparameters 120582lowast corresponding to the minimumAIC value hasbeen retained resulting in 7 HSMMs with a number of statesfrom 2 to 8

The obtained results are shown in Figure 3 for boththe continuous and discrete observation data As it can benoticed for all the 6 test cases of Figure 3 the AIC valuesdo not improve much for a number of states higher than 5meaning that adding more states does not add considerableinformation to the HSMM modeling power Hence 5 statescan be considered as an optimal number of states Moreoveras shown in the zoomed sections of Figure 3 for the HSMMswith 5 states the minimum AIC values are obtained for theduration distributions corresponding to the ones used togenerate the data As a consequence AIC can be considered asan effective approach to performmodel selection for HSMMas well as selecting the appropriate parametric distributionfamily for the state duration modeling

513 Condition Monitoring The optimal parameters 120582lowast

obtained in the previous phase have been used to performthe online conditionmonitoring experiment on the 10 testingcases for all the 6 considered HSMM configurations In thisexperiment we simulate online monitoring by consideringall the testing observations up to the current time thatis x

1x2sdot sdot sdot x

119905 Each time a new data point is acquired

the Viterbi algorithm is used to estimate the current state119904lowast

119905= argmax

1le119894le119873[120575

119905(119894)] as specified in (34)

12 Mathematical Problems in Engineering

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(a) AIC values for continuous data and Gaussian durationdistribution

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(b) AIC values for continuous data and Gamma duration distri-bution

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(c) AIC values for continuous data and Weibull duration distri-bution

2 3 4 5 6 7 8Number of states

0

02

04

06

08

1

12

14

(d) AIC values for discrete data and Gaussian duration distribu-tion

2 3 4 5 6 7 8Number of states

0

02

04

06

08

1

12

14

GaussianGamma

Weibull

(e) AIC values for discrete data andGammaduration distribution

2 3 4 5 6 7 8Number of states

GaussianGamma

Weibull

0

02

04

06

08

1

12

14

(f) AIC values for discrete data andWeibull duration distribution

Figure 3 Akaike Information Criterion (AIC) applied to continuous and discrete observations data AIC is effective for automatic modelselection since its minimum value provides the same number of states and duration model used to generate the data

An example of execution of the condition monitoringexperiment is shown in Figure 4 for both continuous anddiscrete observations respectively In Figure 4(a) the stateduration has beenmodeledwith aGammadistribution whilein Figure 4(b) with a Gaussian distribution In Figures 4(a)and 4(b) the first display represents the true state of the

HSMM and the second display represents the estimated statefrom the Viterbi algorithm while the third display representsthe observed time series

Knowing the true state sequence we calculated theaccuracy defined as the percentage of correctly estimatedstates over the total length of the state sequence for each of

Mathematical Problems in Engineering 13

0 100 200 300 400 500 600 70012345

True state sequence

0 100 200 300 400 500 600 70012345

Viterbi path accuracy 0985

0 100 200 300 400 500 600 7000

204060 Observations

Correct guessWrong guess

(a) State estimation with Viterbi path for continuous data andGamma duration distribution

0 100 200 300 400 500 600 70012345

True state sequence

0 100 200 300 400 500 600 70012345

Viterbi path accuracy 0992

Correct guessWrong guess

0 100 200 300 400 500 600 700246

Observations

(b) State estimation with Viterbi path for discrete data andGaussian duration distribution

Figure 4 Condition monitoring using the Viterbi path HSMMs can be effective to solve condition monitoring problems in time-dependentapplications due to their high accuracy in hidden state recognition

the testing casesThe results are summarized in Table 1(a) forthe continuous observations and in Table 1(b) for the discreteobservations The high percentage of correct classified statesshows that HSMMs can be effectively used to solve conditionmonitoring problems for applications in which the systemshows a strong time and state duration dependency

514 Remaining Useful Lifetime Estimation In this experi-mental phasewe considered the state 119878

5as the failure state and

the trained parameters 120582lowast of Section 512 for each HSMMconfiguration As the online RUL estimation procedure isintended to be used in real time we simulated conditionmonitoring experiment where we progressively consider theobservations x

1x2sdot sdot sdot x

119905 up to time 119905 When a new obser-

vation is acquired after the current state probability 120575119905(119894)

is estimated (Equation (48)) the calculation of the averageupper and lower RUL ((57) (58) and (59)) is performed

Examples of RUL estimation are illustrated in Figure 5In particular Figure 5(a) represents the case of continuousobservations and durationmodeled by aWeibull distributionwhile Figure 5(b) shows the case of discrete observationsand duration modeled by a Gamma distribution From thefigures one can notice that the average as well as the lowerand the upper bound estimations converges to the real RULas the failure time approaches Moreover as expected theuncertainty about the estimation decreases with the timesince predictions performed at an early stage are moreimprecise As a consequence the upper and the lower boundbecome more narrow as the failure state approaches and theestimation becomes more precise until it converges to theactual RUL value with the prediction error tending to zeroat the end of the evaluation

To quantitatively estimate the performance of our meth-odology for the RUL estimation we considered at each time119905 the absolute prediction error (APE) between the real RULand the predicted value defined as

APE (119905) = 1003816100381610038161003816RULreal (119905) minus RUL (119905)1003816100381610038161003816 (63)

where RULreal(119905) is the (known) value of the RUL at time119905 while RUL(119905) is RUL predicted by the model To evaluatethe overall performance of our methodology we consideredthe average absolute prediction error of the RUL estimationdefined as

APE =sum

119879

119905=1APE (119905)119879

(64)

where 119879 is the length of the testing signal APE being aprediction error average values of (64) close to zero corre-spond to good predictive performances

The results for each of the 10 testing cases and the differentHSMM configurations are reported in Tables 2(a) and 2(b)for the continuous and the discrete observation cases respec-tively As it can be noticed the online Remaining UsefulLifetime estimation and in general the online prediction ofthe time to a certain event can be effectively faced withHSMMs which achieve a reliable estimation power with asmall prediction error

Finally we tested our RUL estimationmethodology usingthe state duration estimation of (16) introduced byAzimi et al[30ndash32] The results are shown in Tables 3(a) and 3(b)in which respectively the prediction errors obtained forcontinuous and discrete observations are reported

Comparing Table 2 and Table 3 one can notice that theproposed RUL method outperforms the one of Azimi This

14 Mathematical Problems in Engineering

0 50 100 150 200 250 300 3500

50

100

150

200

250

300

350

400

Time

True RULUpper RUL

Average RULLower RUL

(a) Remaining Useful Lifetime estimation for continuous data andWeibull duration distribution

0 50 100 150 200 250 300 3500

50

100

150

200

250

300

350

400

Time

True RULUpper RUL

Average RULLower RUL

(b) Remaining Useful Lifetime estimation for discrete data andGamma duration distribution

Figure 5 HSMMs effectively solve RUL estimation problemsThe prediction converges to the actual RUL value and its uncertainty decreasesas the real failure time approaches

Table 1 State recognition accuracy

(a) Continuous observations

Test case Duration distributionGaussian Gamma Weibull

1 994 985 9922 997 986 9953 994 992 9974 989 989 9975 982 989 1006 991 988 9977 985 994 9978 992 991 9959 992 986 99710 992 991 995Average 991 989 996

(b) Discrete observations

Test case Duration distributionGaussian Gamma Weibull

1 974 967 9742 972 976 9653 994 958 9664 982 953 9775 991 974 9756 978 977 9787 958 972 9668 977 964 9729 989 972 98510 992 956 969Average 981 967 973

is mainly due to the proposed average state duration of (20)compared to the one of Azimi given by (16)

52 Real Data In this sectionwe apply the proposedHSMM-based approach for RUL estimation to a real case studyusing bearing monitoring data recorded during experimentscarried out on the Pronostia experimental platform andmadeavailable for the IEEE Prognostics and Health Management(PHM) 2012 challenge [49]The data correspond to normallydegraded bearings leading to cases which closely correspondto the industrial reality

The choice of testing the proposed methodology onbearings derives from two facts (i) bearings are the mostcritical components related to failures of rotating machines[50] and (ii) their monotonically increasing degradationpattern justifies the usage of left-right HSMMmodels

521 Data Description Pronostia is an experimental plat-form designed and realized at the Automatic Controland Micro-Mechatronic Systems (AS2M) Department ofFranche-Comte Electronics Mechanics Thermal Process-ing Optics-Science and Technology (FEMTO-ST) Institute(httpwwwfemto-stfr) (Besancon France) with the aimof collecting real data related to accelerated degradation ofbearings Such data are used to validate methods for bearingcondition assessment diagnostic and prognostic [19 51ndash59]

The Pronostia platform allows to perform run-to-failuretests under constant or variable operating conditions Theoperating conditions are determined by two factors thatcan be controlled online (i) the rotation speed and (ii)the radial force load During each experiment tempera-ture and vibration monitoring measurements are gatheredonline through two type of sensors placed in the bearing

Mathematical Problems in Engineering 15

Table 2 Average absolute prediction error (APE) of the RUL esti-mation using the proposed state duration estimator of (20)

(a) APE of the RUL estimation for the continuous observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 51 170 67 140 290 091 45 170 812 76 190 50 61 210 85 66 190 613 77 54 190 29 120 170 160 290 304 90 210 29 75 220 68 60 190 675 73 190 47 22 140 140 39 170 876 65 180 56 51 180 100 140 270 277 47 160 75 48 170 110 12 130 1208 100 220 29 52 180 100 92 220 399 31 92 140 20 160 130 82 210 4910 64 180 56 75 220 69 33 120 130Average 68 170 74 57 190 99 73 200 70

(b) APE of the RUL estimation for the discrete observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 21 110 140 31 88 140 24 120 1302 21 110 130 110 220 33 190 320 713 51 170 76 66 180 51 23 140 1104 59 65 180 52 170 67 42 160 905 32 140 100 83 190 34 120 240 296 120 240 27 62 180 52 41 84 1607 29 150 97 93 210 23 190 310 668 150 270 70 74 180 43 43 170 949 59 180 77 110 230 55 39 160 8810 35 110 140 55 60 160 52 170 71Average 57 150 100 74 170 66 77 190 90

housing a temperature probe and two accelerometers (oneon the vertical and one on the horizontal axis)

The platform is composed of three main parts a rotatingpart a load profile generation part and a measurement partas illustrated in Figure 6

The rotating part is composed of an asynchronous motorwhich develops a power equal to 250W two shafts and agearbox which allows the motor to reach its rated speed of2830 rpm The motorrsquos rotation speed and the direction areset through a human machine interface

The load profiles part issues a radial force on the externalring of the test bearing through a pneumatic jack connectedto a lever arm which indirectly transmits the load througha clamping ring The goal of the applied radial force is toaccelerate the bearingrsquos degradation process

Table 3 Average absolute prediction error (APE) of the RUL esti-mation using the state duration estimator of (16) introduced byAzimi et al [30ndash32]

(a) APE of the RUL estimation for the continuous observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 578 510 668 262 97 527 259 284 6462 502 444 577 213 170 469 290 192 7083 503 447 573 271 87 565 345 139 7344 518 460 604 213 143 459 349 171 7875 594 537 662 290 95 554 334 156 7496 580 517 671 258 83 541 231 258 6657 594 536 669 182 125 477 360 171 7448 634 556 723 194 157 441 348 178 7709 491 435 570 145 171 432 251 267 67010 544 484 628 232 79 527 241 245 674Average 554 493 635 226 121 499 301 206 715

(b) APE of the RUL estimation for the discrete observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 514 410 624 424 318 530 326 264 7362 496 399 604 595 483 708 313 276 6933 502 386 623 465 357 574 324 257 7024 422 315 538 501 405 606 237 361 6035 443 339 558 478 374 591 360 256 7656 522 432 627 552 443 669 272 316 6437 550 439 668 560 457 670 347 232 7448 503 390 620 604 505 710 351 264 7249 555 474 640 480 372 595 318 222 73610 490 382 607 521 412 631 294 289 687Average 500 397 611 518 413 629 314 274 704

Themeasurement part consists of a data acquisition cardconnected to the monitoring sensors which provides theuser with the measured temperature and vibration data Thevibration measurements are provided in snapshots of 01 scollected each 10 seconds at a sampling frequency of 256 kHz(2560 samples per each snapshot) while the temperature hasbeen continuously recorded at a sampling frequency of 10Hz(600 samples collected each minute)

Further details on the Pronostia test rig can be found onthe data presentation paper [49] and on the web page of thedata challenge (httpwwwfemto-stfrenResearch-depart-mentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-challengephp)

16 Mathematical Problems in Engineering

Table 4 Lifetime duration (in seconds) and operating conditions of the bearings tested in the IEEE PHM 2012 Prognostic Challenge dataset[49]

Condition 1 Condition 2 Condition 31800 rpm and 4000N 1650 rpm and 4200N 1500 rpm and 5000N

Bearing Lifetime [s] Bearing Lifetime [s] Bearing Lifetime [s]Bearing1 1 28030 Bearing2 1 9110 Bearing3 1 5150Bearing1 2 8710 Bearing2 2 7970 Bearing3 2 16370Bearing1 3 23750 Bearing2 3 19550 Bearing3 3 4340Bearing1 4 14280 Bearing2 4 7510Bearing1 5 24630 Bearing2 5 23110Bearing1 6 24480 Bearing2 6 7010Bearing1 7 22590 Bearing2 7 2300

Loadmodule

Testedbearing

Dataacquisitionmodule

Rotatingmodule

Figure 6 Global overview of the Pronostia experimental platform[19]

Regarding the data provided for the PHM 2012 challenge3 different operating conditions were considered

(i) first operating conditions speed of 1800 rpm and loadof 4000 Newton

(ii) second operating conditions speed of 1650 rpm andload of 4200 Newton

(iii) third operating conditions speed of 1500 rpm andload of 5000 Newton

Under the above operating conditions a total of 17 accel-erated life tests were realized on bearings of type NSK 6804DD which can operate at a maximum speed of 13000 rpmand a load limit of 4000N The tests were stopped whenthe amplitude of the vibration signal was higher than 20 gthus this moment was defined as the bearing failure time Anexample of bearing before and after the experiment is shownin Figure 7 together with the corresponding vibration signalcollected during the whole test

Table 4 reports how the 17 tested bearings were separatedinto the three operating conditions Moreover the durationof each experiment being the RUL to be predicted for eachbearing is also given We performed two sets of experimentsby considering respectively the bearings relative to thefirst and the second operating condition (ie Bearing1 1Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

As already mentioned the available data correspondto normally degraded bearings meaning that the defects

were not initially induced and that each degraded bearingcontains almost all the types of defects (balls rings and cage)resembling faithfully a common real industrial situationMoreover no assumption about the type of failure to beoccurred is provided with the data and since the variabilityin experiment durations is high (from 1 h to 7 h) performinggood estimates of the RUL is a difficult task [49]

In our experiments we considered as input to our modelthe horizontal channel of the accelerometerWe preprocessedthe raw signals by extracting two time-domain features thatis root mean square (RMS) and kurtosis within windowsof the same length as the given snapshots (119871 = 2560) Let119903119908(119905) be the raw signal of the 119908th window for each 119908 we

estimate RMS as 119909RMS119908

= radic(1119871)sum119871

119905=11199032119908(119905) and kurtosis as

119909KURT119908

= ((1119871)sum119871

119905=1(119903

119908(119905) minus 119903

119908)4)((1119871)sum

119871

119905=1(119903

119908(119905) minus 119903

119908)2)2

where 119903119908is the mean of 119903

119908 An example of feature extraction

for Bearing1 1 is shown in Figure 8To assess the performance of the proposed HSMM after

the model selection procedure we implemented a leave-one-out cross validation scheme by considering separatelyconditions 1 and 2 we performed the online RUL estimationfor each of the 7 bearings using an HSMM trained with theremaining 6 bearing histories Similarly to the simulated casewe considered the average absolute prediction error definedin (64) to quantitatively evaluate our method

522 Bearings RUL Estimation We performed our exper-iments in two steps firstly we applied model selection inorder to determine an optimalmodel structure and secondlywe estimated the RUL of the bearings The full procedure isdetailed in the following

(A) HSMM Structure To determine an appropriate HSMMstructure for effectively modeling the considered data weconsidered several HSMM structures characterized by (i)the duration distribution family (being Gaussian Gamma orWeibull) (ii) an increasing number of states119873 from 2 to 6and (iii) an increasing number of Gaussian mixtures 119872 inthe observation density from 1 to 4 For eachmodel structureobtained by systematically considering all the combinationsof (i) to (iii) we run 120 parameter learnings correspondingto 120 random initializations 1205820 on the data sets (Bearing1 1

Mathematical Problems in Engineering 17

times106

0 1 2 3 4 5 6 7

50

40

30

20

10

0

minus10

minus20

minus30

minus40

minus50

Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]

(b)(a)

0 280300

7

Extracted feature RMS

Time (s)

0 280300

60

Time (s)

Extracted feature kurtosis

Featureextraction

Win

dow

1

Win

dow

2

Win

dow

3

Win

dow

Win

dow

r(t)

x1 x2 x3

middot middot middot

middot middot middot

nminus1

n

xnminus1 xn

Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1

Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density

(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration

Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast

119894were estimated for the 119894th testing bear-

ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878

4as the failure state The same proce-

dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-

ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as

18 Mathematical Problems in Engineering

2 3 4 5 60

005

01

015

02

025

03

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(a) AIC values for Condition 1

2 3 4 5 60

01

02

03

04

05

06

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(b) AIC values for Condition 2

Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density

0 22590

22590

45000

Time (s)

True RULUpper RUL

Average RULLower RUL

(a) RUL estimation for Bearing1 7

True RULUpper RUL

Average RULLower RUL

0 70100

7010

19000

Time (s)

(b) RUL estimation for Bearing2 6

Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings

well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time

Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors

of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2

6 Conclusion and Future Work

In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation

The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required

Mathematical Problems in Engineering 19

Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds

(a) Condition 1

Test Bearings APEavg APElow APEup

Bearing1 1 105716 127230 94146Bearing1 2 43312 38156 38213Bearing1 3 29970 97309 60912Bearing1 4 63363 28766 148719Bearing1 5 19689 74484 104115Bearing1 6 42530 98964 97937Bearing1 7 13880 74943 100881Average 45494 77122 92132

(b) Condition 2

Test Bearings APEavg APElow APEup

Bearing2 1 24759 50065 72875Bearing2 2 16473 44972 82886Bearing2 3 88771 95083 79621Bearing2 4 17698 42486 49825Bearing2 5 86631 104900 107300Bearing2 6 8771 35047 66870Bearing2 7 30125 38664 66519Average 39033 58745 75128

parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology

This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications

As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension

Appendix

In this appendix we give the derivation of the state durationvariable introduced in (20) as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(119883

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A1)

The random variable 119889119905(119894) has been defined in Section 21

as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889

119905(119894) is sampled from an

arbitrary distribution

119889119905(119894) sim 119891 (119889) (A2)

We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as

P (119889119905(119894) = 119889) = P (119904

119905minus119889minus1= 119878

119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905 120582)

(A3)

We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889

119905(119894) of 119889

119905(119894) defined as its

expected value (see Equation (15))

119889119905(119894) = E (119889

119905(119894) | 119904

119905= 119878

119894 x

1x2sdot sdot sdot x

119905) 1 le 119894 le 119873 (A4)

From the definition of expectation we have

119889119905(119894) =

119905

sum

119889=1

119889 sdot P (119889119905(119894) = 119889)

=

119905

sum

119889=1

119889 sdot P (119904119905minus119889minus1

= 119878119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

(A5)

For 119889119905+1

(119894) we have

119889119905+1

(119894) =

119905+1

sum

119889=1

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119886)

(A6)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

(A7)

By noticing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

= (P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1))

sdot (P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1))

minus1

(A8)

20 Mathematical Problems in Engineering

we can replace the probability of the second termof (A7)with

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119887)

(A9)

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

(A10)

In the last factor of (A10) we can omit the information aboutthe current state and observation by observing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

asymp P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894| 119904

119905= 119878

119894 x

1 x

119905)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119888)

(A11)

if the following independencies hold

119904119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

119883119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

(A12)

wherewithperpwe denote independency Equation (A12) holdsforHMMs (evenwithout conditioning on x

1 x

119905) but they

do not hold for HSMMs since the state duration (expressedby 119904

119905minus119889+1 119904

119905minus1) determines the system evolution On

the other hand state duration is partially known by theobservtions x

1 x

119905Thus the approximation is reasonable

as long as the uncertainty on the states is within limitsFrom (A6) (A9) and (A11) we obtain

119889119905+1

(119894) = (119886) +

119905+1

sum

119889=2

119889 sdot (119887) sdot (119888)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

P(119860119861|119862)=P(119860|119861119862)sdotP(119861|119862)

+

119905+1

sum

119889=2

119889 sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

it does not depend on 119889

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

+ P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [

[

P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894119904

119905+1= 119878

119894 x

1 x

119905x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

for the approximation of (A11)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

]

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) +

119905

sum

1198891015840=1

(1198891015840+ 1)

sdot P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

(A13)

Noticing that

119905

sum

1198891015840=1

P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

+ P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) = 1

(A14)

because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows

119889119905+1

(119894) = P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1) sdot (119889

119905(119894) + 1)

(A15)

The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step

In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889

119905+1(119894)

we can consider the following equality

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=P (119904

119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

P (119904119905+1

= 119878119894| x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

120574119905+1

(119894)

(A16)

If we consider the terms involved in the probability at thenumerator of the right-hand side of (A16) we have that

x1 x

119905⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119861

perp x119905+1⏟⏟⏟⏟⏟⏟⏟

119862

| 119904119905= 119878

119894 119904

119905+1= 119878

119894⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119860

(A17)

Mathematical Problems in Engineering 21

If 119861 perp 119862 | 119860 for the Bayes rule we have that

P (119860 | 119862 119861) =P (119862 | 119860119861) sdot P (119860 | 119861)

P (119862 | 119861) (A18)

Hence we can rewrite the numerator of the right-hand sideof (A16) as follows

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

= (P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905)

sdot P(x119905+1

|

x119905+1

perp119904119905|119904119905+1

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞119904119905= 119878

119894 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

= (P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

sdot

120574119905(119894)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (119904

119905= 119878

119894| x

1 x

119905) sdot

119887119894(x119905+1

)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (x

119905+1| 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

(A19)

The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as

P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

= sum

119889119905

119886119894119894(d

119905) sdot P (119889

119905| x

1 x

119905)

asymp 119886119894119894(d

119905)

(A20)

while the denominator of (A19) can be expressed as follows

P (x119905+1

| x1 x

119905) =

P (x1 x

119905 x

119905+1)

P (x1 x

119905)

=sum

119873

119894=1120572119905+1

(119894)

sum119873

119894=1120572119905(119894)

(A21)

By substituting (A20) and (A21) in (A19) we obtain

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

sum119873

119894=1120572119905+1

(119894)

(A22)

and then by combining (A22) and (A16) we obtain

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

120574119905+1

(119894) sum119873

119894=1120572119905+1

(119894)

(A23)

Finally by substituting (A23) in (A15) and considering that

120574119905(119894) =

120572119905(119894)

sum119873

119894=1120572119905(119894)

(A24)

we derive the induction formula for 119889119905+1

(119894) in terms of modelparameters as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(x

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A25)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989

[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004

[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996

[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002

[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005

[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008

[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309

[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en

[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003

[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002

[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven

22 Mathematical Problems in Engineering

methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012

[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008

[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989

[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031

[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010

[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010

[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011

[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012

[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012

[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980

[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk

[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007

[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005

[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003

[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997

[26] S-Z Yu ldquoHidden semi-Markov modelsrdquo Artificial Intelligencevol 174 no 2 pp 215ndash243 2010

[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003

[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006

[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014

[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003

[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008

[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005

[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999

[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007

[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011

[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001

[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002

[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007

[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997

[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008

[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005

[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997

Mathematical Problems in Engineering 23

[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002

[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986

[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993

[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006

[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973

[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977

[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012

[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985

[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015

[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013

[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013

[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013

[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013

[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012

[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012

[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model

and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013

[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Hidden Semi-Markov Models for Predictive Maintenance

Mathematical Problems in Engineering 3

21 Model Specification A Hidden Semi-Markov Model isa doubly embedded stochastic model with an underlyingstochastic process that is not observable (hidden) but canonly be observed through another set of stochastic processesthat produce the sequence of observations HSMMallows theunderlying process to be a semi-Markov chain with a variableduration or sojourn time for each state The key conceptof HSMMs is that the semi-Markov property holds for thismodel while in HMMs the Markov property implies that thevalue of the hidden state at time 119905 depends exclusively on itsvalue of time 119905 minus 1 in HSMMs the probability of transitionfrom state 119878

119895to state 119878

119894at time 119905depends on the duration spent

in state 119878119895prior to time 119905

In the following we denote the number of states in themodel as119873 the individual states as 119878 = 119878

1 119878

119873 and the

state at time 119905 as 119904119905 The semi-Markov property can be written

as

P (119904119905+1

= 119878119894| 119904

119905= 119878

119895 119904

1= 119878

119896)

= P (119904119905+1

= 119894 | 119904119905= 119895 119889

119905(119895)) 1 le 119894 119895 119896 le 119873

(1)

where the duration variable 119889119905(119895) is defined as the time spent

in state 119878119895prior to time 119905

Although the state duration is inherently discrete inmany studies [44 45] it has been modeled with a continuousparametric density function Similar to the work of Azimiet al [30ndash32] in this paper we use the discrete counterpartof the chosen parametric probability density function (pdf)With this approximation if we denote the pdf of the sojourntime in state 119878

119894as 119891(119909 120579

119894) where 120579

119894represents the set of

parameters of the pdf relative to the 119894th state the probabilitythat the system stays in state 119878

119894for exactly 119889 time steps

can be calculated as int119889

119889minus1119891(119909 120579

119894)119889119909 Considering the HSMM

formulation we can generally denote the state dependentduration distributions by the set of their parameters relativeto each state as Θ = 120579

1 120579

119873

Many related works on HSMMs [31 32 44 45] consider119891(119909 120579

119894) within the exponential family In particular Gamma

distributions are oftenused in speech processing applicationsIn this work we do not impose a type of distribution functionto model the duration The only requirement is that theduration should be modeled as a positive function beingnegative durations physically meaningless

HSMMs require also the definition of a ldquodynamicrdquo tran-sitionmatrix as a consequence of the semi-Markov propertyDifferently from the HMMs in which a constant transitionprobability leads to a geometric distributed state sojourntime HSMMs explicitly define a transition matrix whichdepending on the duration variable has increasing probabil-ities of changing state as the time goes on For conveniencewe specify the state duration variable in a form of a vector d

119905

with dimensions119873 times 1 as

d119905=

119889119905(119895) if 119904

119905= 119878

119895

1 if 119904119905

= 119878119895

(2)

The quantity 119889119905(119895) can be easily calculated by induction from

119889119905minus1

(119895) as

119889119905(119895) = 119904

119905(119895) sdot 119904

119905minus1(119895) sdot 119889

119905minus1(119895) + 1 (3)

where 119904119905(119895) is 1 if 119904

119905= 119878

119895 0 otherwise

If we assume that at time 119905 the system is in state 119878119894 we can

formally define the duration-dependent transition matrix asAd119905

= [119886119894119895(d

119905)] with

119886119894119895(d

119905) = P (119904

119905+1= 119878

119895| 119904

119905= 119878

119894 119889

119905(119894)) 1 le 119894 119895 le 119873 (4)

The specification of themodel can be further simplified byobserving that at each time 119905 the matrix Ad

119905

can be decom-posed in two terms the recurrent and the nonrecurrent statetransition probabilities

The recurrent transition probabilities P(d119905) = [119901

119894119895(d

119905)]

which depend only on the duration vector d119905and the

parameters Θ take into account the dynamics of the self-transition probabilities It is defined as the probability ofremaining in the current state at the next time step given theduration spent in the current state prior to time 119905

119901119894119894(d

119905) = P (119904

119905+1= 119878

119894| 119904

119905= 119878

119894 119889

119905(119894))

= P (119904119905+1

= 119878119894|

119904119905= 119878

119894 119904

119905minus1= 119878

119894 119904

119905minus119889119905(119894)+1

= 119878119894 119904

119905minus119889119905(119894)

= 119878119894)

= (P (119904119905+1

= 119878119894 119904

119905= 119878

119894 119904

119905minus119889119905(119894)+2

= 119878119894|

119904119905minus119889119905(119894)+1

= 119878119894 119904

119905minus119889119905(119894)

= 119878119894))

sdot (P (119904119905= 119878

119894 119904

119905minus1= 119878

119894 119904

119905minus119889119905(119894)+2

= 119878119894|

119904119905minus119889119905(119894)+1

= 119878119894 119904

119905minus119889119905(119894)

= 119878119894))

minus1

(5)

The denominator in (5) can be expressed as suminfin

119896=1P(119904

119905+119896=

119878119894 119904

119905+119896minus1= 119878

119894 119904

119905minus119889119905(119894)+2

= 119878119894| 119904

119905minus119889119905(119894)+1

= 119878119894 119904

119905minus119889119905(119894)

= 119878119894)

which is the probability that the system at time 119905 has beenstaying in state 119878

119894for at least 119889

119905(119894) minus 1 time units The above

expression is equivalent to 1 minus 119865(119889119905(119894) minus 1 120579

119894) where 119865(sdot 120579

119894)

is the duration cumulative distribution function relative tothe the state 119878

119894 that is 119865(119889 120579) = int

119889

minusinfin119891(119909 120579)119889119909 As a

consequence from (5) we can define the recurrent transitionprobabilities as a diagonal matrix with dimensions119873times119873 as

P (d119905) = [119901

119894119895(d

119905)] =

1 minus 119865 (119889119905(119894) 120579

119894)

1 minus 119865 (119889119905(119894) minus 1 120579

119894)

if 119894 = 119895

0 if 119894 = 119895

(6)

The usage of the cumulative functions in (6) which tendto 1 as the duration tends to infinity suggests that theprobability of self-transition tends to decrease as the sojourntime increases leading the model to always leave the currentstate if time approaches infinity

The nonrecurrent state transition probabilities A0=

[1198860

119894119895] rule the transitions between two different states It is

4 Mathematical Problems in Engineering

represented by a 119873 times 119873 matrix with the diagonal elementsequal to zero defined as

A0= [119886

0

119894119895] =

0 if 119894 = 119895

P (119904119905+1

= 119878119895| 119904

119905= 119878

119894) if 119894 = 119895

(7)

A0 must be specified as a stochastic matrix that is its ele-ments have to satisfy the constraintsum119873

119895=11198860

119894119895= 1 for all 119894

As a consequence of the above decomposition thedynamic of the underlying semi-Markov chain can be definedby specifying only the state-dependent duration parametersΘ and the nonrecurrentmatrixA0 since themodel transitionmatrix can be calculated at each time 119905 using (6) and (7)

Ad119905

= P (d119905) + (I minus P (d

119905))A0

(8)

where I is the identity matrix If we denote the elements ofthe dynamic transition matrix Ad

119905

as 119886119894119895(d

119905) the stochastic

constraint sum119873

119895=1119886119894119895(d

119905) = 1 for all 119894 and 119905 is guaranteed from

the fact that P(d119905) is a diagonal matrix and A0 is a stochastic

matrixFor several applications it is necessary to model the

absorbing state which in the case of industrial equipmentcorresponds to the ldquobrokenrdquo or ldquofailurerdquo state If we denote theabsorbing state as 119878

119896with 119896 isin [1119873] we must fix the 119896th row

of the nonrecurrentmatrixA0 to be 1198860119896119896= 1 and 1198860

119896119894= 0 for all

1 le 119894 le 119873 with 119894 = 119896 By substituting such A0 matrix in (8)it is easy to show that the element 119886

119896119896(d

119905) = 1 and remains

constant for all 119905 while the duration probability parameters120579119896are not influent for the absorbing state 119878

119896 An example of

absorbing state specification will be given in Section 5With respect to the input observation signals in this work

we consider both continuous and discrete data by adaptingthe suitable observationmodel depending on the observationnature In particular for the continuous case we modelthe observations with a multivariate mixture of Gaussiansdistributions This choice presents two main advantages (i)a multivariate model allows to deal with multiple observa-tions at the same time this is often the case of industrialequipments modeling since at each time multiple sensorsrsquomeasurements are available and (ii) mixture of Gaussians hasbeen proved to closely approximate any finite and continuousdensity function [33] Formally if we denote by x

119905the

observation vector at time 119905 and the generic observationvector beingmodeled as x the observation density for the 119895thstate is represented by a finite mixture of119872 gaussians

119887119895(x) =

119872

sum

119898=1

119888119895119898N (x120583

119895119898U

119895119898) 1 le 119895 le 119873 (9)

where 119888119895119898

is the mixture coefficient for the 119898th mixture instate 119878

119895 which satisfies the stochastic constraintsum119872

119898=1119888119895119898

= 1

for 1 le 119895 le 119873 and 119888119895119898

ge 0 for 1 le 119895 le 119873 and 1 le 119898 le 119872while N is the Gaussian density with mean vector 120583

119895119898and

covariance matrix U119895119898

for the 119898th mixture component instate 119895

In case of discrete data we model the observationswithin each state with a nonparametric discrete probability

distribution In particular if 119871 is the number of distinctobservation symbols per state and if we denote the symbolsas 119883 = 119883

1 119883

119871 and the observation at time 119905 as 119909

119905 the

observation symbol probability distribution can be defined asa matrix 119861 = [119887

119895(119897)] of dimensions119873 times 119871 where

119887119895(119897) = P [119909

119905= 119883

119897| 119904

119905= 119878

119895] 1 le 119895 le 119873 1 le 119897 le 119871 (10)

Since the system in each state at each time step can emit oneof the possible 119871 symbols the matrix 119861 is stochastic that is itis constrained to sum119871

119897=1119887119895(119897) = 1 for all 1 le 119895 le 119873

Finally as in the case of HMMs we specify the initial statedistribution 120587 = 120587

119894 which defines the probability of the

starting state as

120587119894= P [119904

1= 119878

119894] 1 le 119894 le 119873 (11)

From the above considerations two different HSMMmodels can be considered In the case of continuous obser-vation 120582 = (A0

Θ 119862 120583 119880 120587) and in the case of discreteobservation the HSMM is characterized by 120582 = (A0

Θ 119861 120587)An example of continuous HSMM with 3 states is shown inFigure 1

22 Learning and Inference Algorithms Let us denote thegeneric sequence of observations being indiscriminatelycontinuous vectors or discrete symbols as x = x

1x2sdot sdot sdot x

119879 in

order to use the defined HSMM model in practice similarlyto the HMM we need to solve three basic problems

(1) Given the observation x and a model 120582 calculate theprobability that the sequence x has been generated bythe model 120582 that is P(x | 120582)

(2) Given the observation x and a model 120582 calculatethe state sequence 119878 = 119904

11199042sdot sdot sdot 119904

119879which have most

probably generated the sequence x(3) Given the observation x find the parameters of the

model 120582 which maximize P(x | 120582)

As in case of HMM solving the above problems requiresusing the forward-backward [13] decoding (Viterbi [46] andForney [47]) and ExpectationMaximization [48] algorithmswhichwill be adapted to theHSMMintroduced in Section 21In the following we also propose a more effective estimatorof the state duration variable 119889

119905(119894) defined in (2)

221 The Forward-Backward Algorithm Given a genericsequence of observations x = x

1x2sdot sdot sdot x

119879 the goal is to

calculate the model likelihood that isP(x | 120582) This quantityis useful for the training procedure where the parametersthat locally maximize the model likelihood are chosen aswell as for classification problems The latter is the case inwhich the observation sequence x has to be mapped to oneof a finite set of 119862 classes represented by a set of HSMMparameters 119871 = 120582

1 120582

119862 The class of x is chosen such

that 120582(x) = argmax120582isin119871

P(119883 | 120582)To calculate the model likelihood we first define the

forward variable at each time 119905 as

120572119905(119894) = P (x

1x2sdot sdot sdot x

119905 119904

119905= 119878

119894| 120582) 1 le 119894 le 119873 (12)

Mathematical Problems in Engineering 5

Hiddenstates

Observationprobabilities

Sojournprobabilities

Time (u) Time (u) Time (u)

Observed Observed Observed

S1 S2 S3

a12 a23

P(oS1) P(oS2) P(oS3)

d3(u)d2(u)d1(u)

Figure 1 Graphical representation of an HSMM

Contrarily toHMMs forHSMMs the state durationmustbe taken into account in the the forward variable calculationConsequently Yu [26] proposed the following inductiveformula

120572119905(119895 119889) =

119863

sum

1198891015840=1

119873

sum

119894=1

(120572119905minus1198891015840 (119894 119889

1015840) 119886

0

119894119895119901119895119895(119889

1015840)

119905

prod

119896=119905minus119889+1

119887119895(x

119896))

1 le 119895 le 119873 1 le 119905 le 119879

(13)

that is the sum of the probabilities of being in the currentstate 119878

119895(at time 119905) for the past 1198891015840 time units (with 1 le 119889

1015840le 119863

and119863 themaximumallowed duration for each state) comingfrom all the possible previous states 119878

119894 1 le 119894 le 119873 and 119894 = 119895

The disadvantage of the above formulation is that asdiscussed in Introduction the specification of the maximumduration 119863 represents a limitation to the modeling general-ization Moreover from (13) it is clear that the computationand memory complexities drastically increase with119863 whichcan be very large inmany applications in particular for onlinefailure prediction

To alleviate this problem Azimi et al [30ndash32] introduceda new forward algorithm for HSMMs that by keeping trackof the estimated average state duration at each iterationhas a computational complexity comparable to the forwardalgorithm for HMMs [13] However the average state dura-tion represents an approximation Consequently the forwardalgorithm of Azimi compared with (13) pays the priceof a lower precision in favor of a (indispensable) bettercomputational efficiency

To calculate the forward variable 120572119905(119895) using Azimirsquos

approach the duration-dependent transition matrix defined

in (8) is taken in consideration in the induction formula of(13) which becomes [30]

120572119905(119895) = [

119873

sum

119894=1

120572119905minus1

(119894) 119886119894119895(d

119905minus1)] 119887

119895(x

119905) (14)

To calculate the above formula the average state duration of(2)must be estimated for each time 119905 bymeans of the variabled119905= [119889

119905(119894)] defined as

119889119905(119894) = E (119889

119905(119894) | x

1x2sdot sdot sdot x

119905 119904

119905= 119878

119894 120582) 1 le 119894 le 119873 (15)

where E denotes the expected value To calculate the abovequantity Azimi et al [30ndash32] use the following formula

d119905= 120574

119905minus1⊙ d

119905minus1+ 1 (16)

where ⊙ represents the element by element product betweentwo matricesvectors and the vector 120574t = [120574

119905(119894)] (the

probability of being in state 119878119894at time 119905 given the observation

sequence and the model parameters) with dimensions119873 times 1

is calculated in terms of 120572119905(119894) as

120574119905(119894) = P (119904

119905= 119878

119894| x

1x2sdot sdot sdot x

119905 120582) =

120572119905(119894)

sum119873

119895=1120572119905(119895)

1 le 119894 le 119873

(17)

Equation (16) is based on the following induction formula[30ndash32] that rules the dynamics of the duration vector whenthe systemrsquos state is known

119889119905(119894) = 119904

119905minus1(119894) sdot 119889

119905minus1(119894) + 1 (18)

where for each 119905 119904119905(119894) is 1 if 119904

119905= 119878

119894 0 otherwise

6 Mathematical Problems in Engineering

A simple example shows that (18) is incorrect assum-ing an HSMM with three states and considering the statesequence (119878

1 119878

1 119878

2 ) the correct sequence of the duration

vector is d1= [1 1 1]

119879 d2= [2 1 1]

119879 and d3= [1 1 1]

119879where the superscript 119879 denotes vector transpose If we apply(18) we obtain d

1= [1 1 1]

119879 d2= [2 1 1]

119879 and d3=

[1 2 1]119879 which is in contradiction with the definition of the

state duration vector given in (2)To calculate the average state duration variable 119889

119905(119894) we

propose a new induction formula that estimates for each time119905 the time spent in the 119894th state prior to 119905 as

119889119905(119894) = P (119904

119905minus1= 119878

119894| 119904

119905= 119878

119894 x

1 x

119905) sdot (119889

119905minus1(119894) + 1) (19)

=119886119894119894(d

119905minus1) sdot 120572

119905minus1(119894) sdot 119887

119894(x

119905)

120572119905(119894)

sdot (119889119905minus1

(119894) + 1)

1 le 119894 le 119873

(20)

The derivation of (20) is given in Appendix The intuitionbehind (19) is that the current average duration is the previousaverage duration plus one weighted with the ldquoamountrdquo of thecurrent state that was already in state 119878

119894in the previous step

Using the proposed (20) the forward algorithm can bespecified as follows

(1) initialization with 1 le 119894 le 119873

1205721(119894) = 120587

119894119887119894(x

1)

1198891(119894) = 1

Ad1

= P (d1) + (I minus P (d

1))A0

(21)

where P(d119894) is estimated using (6)

(2) induction with 1 le 119895 le 119873 and 1 le 119905 le 119879 minus 1

120572119905+1

(119895) = [

119873

sum

119894=1

120572119905(119894) 119886

119894119895(d

119905)] 119887

119895(x

119905+1) (22)

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(x

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (23)

Ad119905+1

= P (d119905+1

) + (I minus P (d119905+1

))A0 (24)

where 119886119894119895(d

119905) are the coefficients of the matrix Ad

119905

(3) termination

P (x | 120582) =119873

sum

119894=1

120572119879(119894) (25)

Similar considerations as the forward procedure can bemade for the backward algorithm which is implemented bydefining the variable 120573

119905(119894) as

120573119905(119894) = P (x

119905+1x119905+2

sdot sdot sdot x119879| 119904

119905= 119878

119894 120582) 1 le 119894 le 119873 (26)

Having estimated the dynamic transition matrix Ad119905

foreach 1 le 119905 le 119879 using (24) the backward variable can becalculated inductively as follows

(1) Initialization

120573119879(119894) = 1 1 le 119894 le 119873 (27)

(2) Induction

120573119905(119894) =

119873

sum

119895=1

119886119894119895(d

119905) 119887

119895(x

119905+1) 120573

119905+1(119895)

119905 = 119879 minus 1 119879 minus 2 1 1 le 119894 le 119873

(28)

Although the variable 120573119905(119894) is not necessary for the

calculation of the model likelihood it will be useful in theparameter reestimation procedure as it will be explained inSection 223

222 The Viterbi Algorithm The Viterbi algorithm [46 47](also known as decoding) allows determining the best statesequence corresponding to a given observation sequence

Formally given a sequence of observation x = x1x2sdot sdot sdot x

119879

the best state sequence 119878lowast = 119904lowast

1119904lowast

2sdot sdot sdot 119904

lowast

119879corresponding to x is

calculated by defining the variable 120575119905(119894) as

120575119905(119894) = max

11990411199042119904119905minus1

P (11990411199042 119904

119905= 119878

119894 x

1x2sdot sdot sdot x

119905| 120582) (29)

The procedure to recursively calculate the variable 120575119905(119894)

and to retrieve the target state sequence (ie the argumentswhich maximize the 120575

119905(119894)rsquos) for the proposed HSMM is a

straightforward extension of theViterbi algorithm forHMMs[13]The only change is the usage in the recursive calculationof 120575

119905(119894) of the dynamic transition matrix Ad

119905

= [119886119894119895(d

119905)]

calculated through (24) The Viterbi algorithm for the intro-duced parametric HSMMs can be summarized as follows

(1) initialization with 1 le 119894 le 119873

1205751(119894) = 120587

119894119887119894(x

1)

1205951(119894) = 0

(30)

(2) recursion with 1 le 119895 le 119873 and 2 le 119905 le 119879

120575119905(119895) = max

1le119894le119873

[120575119905minus1

(119894) 119886119894119895(d

119905)] 119887

119895(x

119905) (31)

120595119905(119895) = argmax

1le119894le119873

[120575119905minus1

(119894) 119886119894119895(d

119905)] (32)

(3) termination

119875lowast= max

1le119894le119873

[120575119879(119894)] (33)

119904lowast

119879= argmax

1le119894le119873

[120575119879(119894)] (34)

where we keep track of the argument maximizing (31) usingthe vector 120595

119905 which tracked back gives the desired best

state sequence

119904lowast

119905= 120595

119905+1(119904

lowast

119905+1) 119905 = 119879 minus 1 119879 minus 2 1 (35)

Mathematical Problems in Engineering 7

223 The Training Algorithm The training algorithm con-sists of estimating themodel parameters from the observationdata As discussed in Section 21 a parametric HSMMis defined by 120582 = (A0

Θ 119862 120583 119880 120587) if the observationsare continuous or 120582 = (A0

Θ 119861 120587) if the observationsare discrete Given a generic observation sequence x =

x1x2sdot sdot sdot x

119879 referred to as training set in the following the

training procedure consists of finding the model parameterset120582lowast which locallymaximizes themodel likelihoodP(x | 120582)

We use the modified Baum-Welch algorithm of Azimiet al [30ndash32] However in our implementation we do notmake assumption on the density function used to model thestate duration and we consider both continuous and discreteobservations

Being a variant of the more general Expectation-Max-imization (EM) algorithm Baum-Welch is an iterative proce-dure which consists of two steps (i) the expectation step inwhich the forward and backward variables are calculated andthe model likelihood is estimated and (ii) the maximizationstep in which the model parameters are updated and usedin the next iteration This process usually starts from arandom guess of the model parameters 1205820 and it is iterateduntil the likelihood function does not improve between twoconsecutive iterations

Similarly to HMMs the reestimation formulas arederived by firstly introducing the variable 120585

119905(119894 119895) which

represents the probability of being in state 119878119894at time 119905 and

in state 119878119895at time 119905 + 1 given the model and the observation

sequence as

120585119905(119894 119895) = P (119904

119905= 119878

119894 119904

119905+1= 119878

119895| x 120582) (36)

However in the HSMM case the variable 120585119905(119894 119895) considers

the duration estimation performed in the forward algorithm(see Equation (24)) Formulated in terms of the forward andbackward variables it is given by

120585119905(119894 119895) = P (119904

119905= 119878

119894 119904

119905+1= 119878

119895| x 120582)

=P (119904

119905= 119878

119894 119904

119905+1= 119878

119895 x | 120582)

P (x | 120582)

=120572119905(119894) 119886

119894119895(d

119905) 119887

119895(x

119905+1) 120573

119905+1(119895)

P (x | 120582)

=120572119905(119894) 119886

119894119895(d

119905) 119887

119895(x

119905+1) 120573

119905+1(119895)

sum119873

119894=1sum

119873

119895=1120572119905(119894) 119886

119894119895(d

119905) 119887

119895(x

119905+1) 120573

119905+1(119895)

(37)

From 120585119905(119894 119895) we can derive the quantity 120574

119905(119894) (already

defined in (17)) representing the probability of being in state119878119894at time 119905 given the observation sequence and the model

parameters

120574119905(119894) =

119873

sum

119895=1

120585119905(119894 119895) (38)

Finally the the reestimation formulas for the parameters120587 and A0 are given by

120587119894= 120574

1(119894) (39)

1198860

119894119895=

(sum119879minus1

119905=1120585119905(119894 119895)) ⊙ 119866

sum119873

119895=1(sum

119879minus1

119905=1120585119905(119894 119895)) ⊙ 119866

(40)

where119866 = [119892119894119895] is a squarematrix of dimensions119873times119873where

119892119894119895= 0 for 119894 = 119895 and 119892

119894119895= 1 for 119894 = 119895 ⊙ represents the element

by element product between two matrices sum119879minus1

119905=1120574119905(119894) is the

expected number of transitions from state 119878119894 andsum119879minus1

119905=1120585119905(119894 119895)

is the expected number of transitions from state 119878119894to state 119878

119895

Equation (39) represents the expected number of timesthat the model starts in state 119878

119894 while (40) represents the

expected number of transitions from state 119878119894to state 119878

119895with

119894 = 119895 over the total expected number of transitions from state119878119894to any other state different from 119878

119894

For the matrix A0 being normalized the stochasticconstraints are satisfied at each iteration that is sum119873

119895=11198860

119894119895= 1

for each 1 le 119894 le 119873 while the estimation of the priorprobability 120587

119894inherently sums up to 1 at each iteration since

it represents the expected frequency in state 119878119894at time 119905 = 1

for each 1 le 119894 le 119873With respect to the reestimation of the state duration

parameters Θ firstly we estimate the mean 120583119894119889

and thevariance 1205902

119894119889of the 119894th state duration for each 1 le 119894 le 119873

from the forward and backward variables and the estimationof the state duration variable

120583119894119889

=sum

119879minus1

119905=1120572119905(119894) (sum

119873

119895=1119895 =119894119886119894119895(119889

119905(119894)) 119887

119895(x

119905+1) 120573

119905+1(119895)) 119889

119905(119894)

sum119879minus1

119905=1120572119905(119894) (sum

119873

119895=1119895 =119894119886119894119895(119889

119905(119894)) 119887

119895(x

119905+1) 120573

119905+1(119895))

(41)

1205902

119894119889= (

119879minus1

sum

119905=1

120572119905(119894)(

119873

sum

119895=1119895 =119894

119886119894119895(119889

119905(119894)) 119887

119895(x

119905+1) 120573

119905+1(119895))

sdot (119889119905(119894) minus 120583

119894119889)2

)

sdot (

119879minus1

sum

119905=1

120572119905(119894)(

119873

sum

119895=1119895 =119894

119886119894119895(119889

119905(119894)) 119887

119895(x

119905+1) 120573

119905+1(119895)))

minus1

(42)

where (41) can be interpreted as the probability of transitionfrom state 119878

119894to 119878

119895with 119894 = 119895 at time 119905weighted by the duration

of state 119878119894at 119905 giving the desired expected value while in (42)

the same quantity is weighted by the squared distance of theduration at time 119905 from its mean giving the estimation of thevariance

Then the parameters of the desired duration distributioncan be estimated from 120583

119894119889and 1205902

119894119889 For example if a Gamma

distribution with shape parameter ] and scale parameter 120578 ischosen to model the state duration the parameters ]

119894and 120578

119894

for each 1 le 119894 le 119873 can be calculated as ]119894= 120583

2

119894119889120590

2

119894119889and

120578119894= 120590

2

119894119889120583

119894119889

8 Mathematical Problems in Engineering

Concerning the observation parameters once the modi-fied forward and backward variables accounting for the stateduration are defined as in (22) and (28) the reestimationformulas are the same as for Hidden Markov Models [13]

In particular for continuous observations the parametersof the Gaussiansrsquo mixture defined in (9) are reestimated byfirstly defining the probability of being in state 119878

119895at time 119905

with the probability of the observation vector x119905evaluated by

the 119896th mixture component as

120574119905(119895 119896) = [

[

120572119905(119895) 120573

119905(119895)

sum119873

119895=1120572119905(119895) 120573

119905(119895)

]

]

sdot [

[

119888119895119896N (x

119905120583

119895119896U

119895119896)

sum119872

119898=1119888119895119898N (x

119905120583

119895119898U

119895119898)

]

]

(43)

By using the former quantity the parameters 119888119895119896 120583

119895119896 andU

119895119896

are reestimated through the following formulas

119888119895119896=

sum119879

119905=1120574119905(119895 119896)

sum119879

119905=1sum

119872

119898=1120574119905(119895 119898)

120583119895119896=sum

119879

119905=1120574119905(119895 119896) sdot x

119905

sum119879

119905=1120574119905(119895 119896)

U119895119896=sum

119879

119905=1120574119905(119895 119896) sdot (x

119905minus 120583

119895119896) (x

119905minus 120583

119895119896)119879

sum119879

119905=1120574119905(119895 119896)

(44)

where superscript 119879 denotes vector transposeFor discrete observations the reestimation formula for

the observation matrix 119887119895(119897) is

119887119895(119897) =

sum119879

119905=1with119909119905=119883119897

120574119905(119895)

sum119879

119905=1120574119905(119895)

(45)

where the quantity 120574119905(119895) which takes into account the dura-

tion dependent forward variable 120572119905(119895) is calculated through

(17)The reader is referred to Rabinerrsquos work [13] for the inter-

pretation on the observation parameters reestimation formu-las

3 AIC-Based Model Selection

In the framework of the proposed parametric HSMMs themodel selection procedure aims to select the optimal numberof hidden states119873 the right duration distribution family andin the case of mixture observation modeling the number ofGaussian mixtures 119872 to be used In this work we make useof the Akaike Information Criterion (AIC) Indeed it hasbeen seen that in case of complex models and in presence ofa limited number of training observations AIC represents asatisfactorymethodology formodel selection outperformingother approaches like Bayesian Information Criterion

In general information criteria are represented as a two-term structure They account for a compromise between

a measure of model fitness which is based on the likelihoodof the model and a penalty term which takes into accountthe model complexity Usually the model complexity ismeasured in terms of the number of parameters that have tobe estimated and in terms of the number of observations

The Akaike Information Criterion is an estimate of theasymptotic value of the expected distance between theunknown true likelihood function of the data and the fittedlikelihood function of the model In particular the AIC canbe expressed as

AIC =minus log 119871 () + 119901

119879 (46)

where 119871() is the likelihood of the model with the estimatedparameters as defined in (25) 119901 is the number of modelparameters and119879 is the length of the observed sequenceThebest model is the one minimizing equation (46)

Concerning 119901 the number of parameters to be estimatedfor a parametric HSMM with119873 states is 119901 = 119901

ℎ+ 119901

119900 where

119901ℎare the parameters of the hidden states layer while 119901

119900are

those of the observation layerIn particular 119901

ℎ= (119873 minus 1) + (119873 minus 1) sdot 119873 + 119911 sdot 119873 where

(i) 119873 minus 1 accounts for the prior probabilities 120587(ii) (119873 minus 1) sdot 119873 accounts for the nonrecurrent transition

matrix A0(iii) 119911sdot119873 accounts for the duration probability being 119911 the

number of parameters 120579 of the duration distribution

Concerning 119901119900 a distinction must be made between

discrete and continuous observations

(i) in the case of discrete observations with 119871 possibleobservable values 119901

119900= (119871 minus 1) sdot 119873 which accounts

for the elements of the observation matrix 119861(ii) if the observations are continuous and a multivariate

mixture of 119872 Gaussians with 119874 variates is used asobservationmodel 119901

119900= [119874 sdot119873sdot119872]+[119874 sdot119874 sdot119873sdot119872]+

[(119872minus 1) sdot 119873] where each term accounts respectivelyfor the mean vector 120583 the covariance matrix 119880 andthe mixture coefficients 119862

4 Remaining Useful Lifetime Estimation

One of the most important advantages of the time modelingof HSMMs is the possibility to effectively face the predictionproblem The knowledge of the state duration distributionsallows the estimation of the remaining time in a certain stateand in general the prediction of the expected time119863 beforeentering in a determinate state

As already mentioned an interesting application of theprediction problem is the Remaining Useful Lifetime (RUL)estimation of industrial equipments Indeed if each stateof an HSMM is mapped to a different condition of anindustrial machine and if the state 119878

119896that represents the

failure condition is identified at each moment the RUL canbe defined as the expected time 119863 to reach the failure state

Mathematical Problems in Engineering 9

119878119896 If we assume that the time to failure is a random variable

119863 following a determinate probability density we define theRUL at the current time 119905 as

RUL119905= 119863 = E (119863) 119904

119905+= 119878

119896 119904

119905+minus1= 119878

119894

1 le 119894 119896 le 119873 119894 = 119896

(47)

where E denotes the expected valueHaving fixed the failure state the estimation of the RUL

is performed in two steps every time a new observation isacquired (online)

(1) estimation of the current state(2) projection of the future state transitions until the

failure state is reached and estimation of the expectedsojourn time

The estimation of the current state is performed via theViterbi path that is the variable 120575

119905= [120575

119905(119894)]

1le119894le119873defined in

(29) To correctly model the uncertainty of the current stateestimation we use the normalized variable 120575

119905(119894) obtained as

120575119905(119894) = max

11990411199042119904119905minus1

P (119904119905= 119878

119894| 119904

11199042sdot sdot sdot 119904

119905minus1 x

1x2sdot sdot sdot x

119905 120582)

=120575119905(119894)

sum119873

119895=1120575119905(119895)

1 le 119894 le 119873

(48)

that is an estimate of the probability of being in state 119878119894at

time 119905Together with the normalized variable 120575

119905(119894) the maxi-

mum a posteriori estimate of the current state 119904lowast

119905is taken

into account according to (34) If 119904lowast119905coincides with the failure

state the desired event is detected by the model and the timeto this event is obviously zero Otherwise an estimation ofthe average remaining time in the current state 119889avg(119904

lowast

119905) is

calculated as

119889avg (119904lowast

119905) =

119873

sum

119894=1

(120583119889119894

minus 119889119905(119894)) ⊙ 120575

119905(119894) (49)

where with 120583119889119894

we denote the expected value of the durationvariable in state 119878

119894according to the duration distribution

specified by the parameters 120579119894 Equation (49) thus estimates

the remaining time in the current state by subtracting theestimated states duration 119889

119905(119894) at time 119905 from the expected

sojourn time of state 119878119894 and weighting the result using the

uncertainty about the current state 120575119905(119894) and finally by

summing up all the contributions from each stateIn addition to the average remaining time a lower and an

upper bound value can be calculated based on the standarddeviation 120590

119889119894

of the duration distribution for state 119878119894

119889low (119904lowast

119905) =

119873

sum

119894=1

(120583119889119894

minus 120590119889119894

minus 119889119905(119894)) ⊙ 120575

119905(119894) (50)

119889up (119904lowast

119905) =

119873

sum

119894=1

(120583119889119894

+ 120590119889119894

minus 119889119905(119894)) ⊙ 120575

119905(119894) (51)

Once the remaining time in the current state is estimatedthe probability of the next state is calculated by multiplyingthe transpose of the nonrecurrent transition matrix by thecurrent state probability estimation as follows

120575next = [120575119905+119889

(119894)]1le119894le119873

= (A0)119879

sdot 120575119905 (52)

while themaximumaposteriori estimate of the next state 119904lowastnextis calculated as

119904lowast

next = 119904lowast

119905+119889= argmax

1le119894le119873

120575119905+119889

(119894) (53)

Again if 119904lowast119905+119889

coincides with the failure state the failurewill happen after the remaining time at the current state isover and the average estimation of the failure time is 119863avg =

119889avg(119904lowast

119905) calculated at the previous step with the bound values

119863low = 119889low(119904lowast

119905) and119863up = 119889up(119904

lowast

119905) Otherwise the estimation

of the sojourn time of the next state is calculated as follows

119889avg (119904lowast

119905+119889) =

119873

sum

119894=1

120583119889119894

⊙ 120575119905+119889

(119894) (54)

119889low (119904lowast

119905+119889) =

119873

sum

119894=1

(120583119889119894

minus 120590119889119894

) ⊙ 120575119905+119889

(119894) (55)

119889up (119904lowast

119905+119889) =

119873

sum

119894=1

(120583119889119894

+ 120590119889119894

) ⊙ 120575119905+119889

(119894) (56)

This procedure is repeated until the failure state isencountered in the prediction of the next state The calcula-tion of the RUL is then simply obtained by summing all theestimated remaining time in each intermediate state beforeencountering the failure state

119863avg = sum119889avg (57)

119863low = sum119889low (58)

119863up = sum119889up (59)

Finally Algorithm 1 details the above described RUL esti-mation procedure

5 Experimental Results

To demonstrate the effectiveness of the proposed HSMMmodels we make use of a series of experiments performedboth on simulated and real data

The simulated data were generated by considering a left-right HSMM and adapting the parameters of the artificialexample reported in the work of Lee et al [15] The real casedata are monitoring data related to the entire operationallife of bearings made available for the IEEE PHM 2012 datachallenge (httpwwwfemto-stfrenResearch-departmentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-chal-lengephp)

10 Mathematical Problems in Engineering

(1) function RulEstimation(x119905 119878

119896) ⊳ x

119905 The last observation acquired

(2) ⊳ 119878119896 The failure state

(3) Initialization(4) 119863avg larr 0

(5) 119863low larr 0

(6) 119863up larr 0

(7) Current state estimation(8) Calculate 120575

119905⊳ Using (48)

(9) Calculate 119904lowast119905

⊳ Using (34)(10) Calculate d

119905⊳ Using (20)

(11) 119878 larr 119904lowast

119905

(12) Loop(13) while 119878 = 119878

119896do

(14) Calculate 119889avg ⊳ Using (49) or (54)(15) Calculate 119889low ⊳ Using (50) or (55)(16) Calculate 119889up ⊳ Using (51) or (56)(17) 119863avg larr 119863avg + 119889avg

(18) 119863low larr 119863low + 119889low(19) 119863up larr 119863up + 119889up(20) Calculate 120575next ⊳ Using (52)(21) Calculate 119904lowastnext ⊳ Using (53)(22) 119878 larr 119904

lowast

nextend

(23) return 119863avg119863low119863up

Algorithm 1 Remaining Useful Lifetime estimation (Pseudo-Code)

51 Simulated Experiment Data have been generated withthe idea of simulating the behavior of an industrial machinethat during its functioning experiences several degradationmodalities until a failure state is reached at the end of itslifetimeThe generated data are used to test the performancesof our methodology for (i) automatic model selection (ii)online condition monitoring and (iii) Remaining UsefulLifetime estimation considering both continuous and dis-crete observations

511 Data Generation The industrial machine subject ofthese experiments has been modeled as a left-right paramet-ric HSMM with 119873 = 5 states having state 119878

5as absorbing

(failure) stateThe choice of a left-right setting has beenmadefor simplicity reasons since the primary goal of this work is todemonstrate that the proposed model specification coupledwith the Akaike Information Criterion is effective to solveautomatic model selection online condition monitoringand prediction problems At this purpose we divided theexperiments in two cases according to the nature of theobservation being continuous or discrete

For each of the continuous and the discrete cases threedata sets have been generated by considering the followingduration models Gaussian Gamma and Weibull densitiesFor each of the three data sets 30 series of data are usedas training set and 10 series as testing set Each time seriescontains 119879 = 650 observations The parameters used togenerate the data are taken from the work of Lee et al [15]

and are adapted to obtain an equivalent left-right parametricHSMM as follows

120587 =

[[[[[

[

1

0

0

0

0

]]]]]

]

A0=

[[[[[

[

0 1 0 0 0

0 0 1 0 0

0 0 0 1 0

0 0 0 0 1

0 0 0 0 1

]]]]]

]

ΘN = 1205791= [100 20] 120579

2= [90 15] 120579

3= [100 20]

1205794= [80 25] 120579

5= [200 1]

ΘG = 1205791= [500 02] 120579

2= [540 01667]

1205793= [500 02] 120579

4= [256 03125]

1205795= [800 0005]

ΘW = 1205791= [102 28] 120579

2= [92 29] 120579

3= [102 28]

1205794= [82 20] 120579

5= [200 256]

(60)

whereΘNΘG andΘW are the different distribution param-eters used for the Gaussian Gamma and Weibull dura-tion models respectively More precisely they represent thevalues of the mean 120583

119889and the the variance 120590

2

119889of the

Gaussian distribution the shape ]119889and the scale 120578

119889of the

Gamma distribution and the scale 119886119889and the shape 119887

119889of

Mathematical Problems in Engineering 11

0 100 200 300 400 500 600

12345

Hidden states sequence

0 100 200 300 400 500 6000

100

200

300 State duration

0 100 200 300 400 500 6000

20

40

60Observed signal

(a) Example of simulated data for the continuous case

0 100 200 300 400 500 600

12345

Hidden states sequence

0 100 200 300 400 500 6000

100

200

300 State duration

0 100 200 300 400 500 600

132

54

76

Observed symbols

(b) Example of simulated data for the discrete case

Figure 2 The data generated with the parameter described in Section 511 both for the continuous and the discrete case

theWeibull distribution It must be noticed that as explainedin Section 21 for state 119878

5 being the absorbing state the

duration parameters 1205795have no influence on the data since

once the state 1198785is reached the system will remain there

foreverConcerning the continuous observation modeling a

bivariate Gaussian distribution has been used with the fol-lowing parameters [15]

1205831= [

20

20] 120583

2= [

20

35] 120583

3= [

35

35]

1205835= [

28

28]

1198801= [

20 0

0 20] 119880

2= [

15 0

0 15] 119880

3= [

15 minus2

minus2 15]

1198804= [

5 0

0 5] 119880

5= [

10 3

3 10]

(61)

while for the discrete case 119871 = 7 distinct observation sym-bols have been taken into consideration with the followingobservation probability distribution

119861 =

[[[[[

[

08 02 0 0 0 0 0

01 08 01 0 0 0 0

0 01 08 01 0 0 0

0 0 01 07 01 01 0

0 0 0 02 06 01 01

]]]]]

]

(62)

An example of simulated data both for the continuous and thediscrete cases is shown in Figure 2 where aGaussian durationmodel has been used

512 Training and Model Selection The goal of this experi-mental phase is to test the effectiveness of the AIC in solvingthe automatic model selection For this purpose the trainingsets of the 6 data sets (continuousdiscrte observation withGaussian Gamma and Weibull duration models) have beentaken individually and for each one of them a series of

learning procedure has been run each one with a variableHSMM structure In particular we took into account all thecombinations of the duration distribution families (GaussianGamma and Weibull) an increasing number of states 119873from 2 to 8 and for the continuous observation cases anincreasing number of Gaussian mixtures119872 in the observa-tion distribution from 1 to 4

As accurate parameter initialization is crucial for obtain-ing a good model fitting [14] a series of 40 learning proce-dures corresponding to 40 random initializations of the initialparameters 1205820 have been executed for each of the consideredHSMM structures For each model structure the AIC valueas defined in (46) has been evaluatedThe final trained set ofparameters 120582lowast corresponding to the minimumAIC value hasbeen retained resulting in 7 HSMMs with a number of statesfrom 2 to 8

The obtained results are shown in Figure 3 for boththe continuous and discrete observation data As it can benoticed for all the 6 test cases of Figure 3 the AIC valuesdo not improve much for a number of states higher than 5meaning that adding more states does not add considerableinformation to the HSMM modeling power Hence 5 statescan be considered as an optimal number of states Moreoveras shown in the zoomed sections of Figure 3 for the HSMMswith 5 states the minimum AIC values are obtained for theduration distributions corresponding to the ones used togenerate the data As a consequence AIC can be considered asan effective approach to performmodel selection for HSMMas well as selecting the appropriate parametric distributionfamily for the state duration modeling

513 Condition Monitoring The optimal parameters 120582lowast

obtained in the previous phase have been used to performthe online conditionmonitoring experiment on the 10 testingcases for all the 6 considered HSMM configurations In thisexperiment we simulate online monitoring by consideringall the testing observations up to the current time thatis x

1x2sdot sdot sdot x

119905 Each time a new data point is acquired

the Viterbi algorithm is used to estimate the current state119904lowast

119905= argmax

1le119894le119873[120575

119905(119894)] as specified in (34)

12 Mathematical Problems in Engineering

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(a) AIC values for continuous data and Gaussian durationdistribution

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(b) AIC values for continuous data and Gamma duration distri-bution

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(c) AIC values for continuous data and Weibull duration distri-bution

2 3 4 5 6 7 8Number of states

0

02

04

06

08

1

12

14

(d) AIC values for discrete data and Gaussian duration distribu-tion

2 3 4 5 6 7 8Number of states

0

02

04

06

08

1

12

14

GaussianGamma

Weibull

(e) AIC values for discrete data andGammaduration distribution

2 3 4 5 6 7 8Number of states

GaussianGamma

Weibull

0

02

04

06

08

1

12

14

(f) AIC values for discrete data andWeibull duration distribution

Figure 3 Akaike Information Criterion (AIC) applied to continuous and discrete observations data AIC is effective for automatic modelselection since its minimum value provides the same number of states and duration model used to generate the data

An example of execution of the condition monitoringexperiment is shown in Figure 4 for both continuous anddiscrete observations respectively In Figure 4(a) the stateduration has beenmodeledwith aGammadistribution whilein Figure 4(b) with a Gaussian distribution In Figures 4(a)and 4(b) the first display represents the true state of the

HSMM and the second display represents the estimated statefrom the Viterbi algorithm while the third display representsthe observed time series

Knowing the true state sequence we calculated theaccuracy defined as the percentage of correctly estimatedstates over the total length of the state sequence for each of

Mathematical Problems in Engineering 13

0 100 200 300 400 500 600 70012345

True state sequence

0 100 200 300 400 500 600 70012345

Viterbi path accuracy 0985

0 100 200 300 400 500 600 7000

204060 Observations

Correct guessWrong guess

(a) State estimation with Viterbi path for continuous data andGamma duration distribution

0 100 200 300 400 500 600 70012345

True state sequence

0 100 200 300 400 500 600 70012345

Viterbi path accuracy 0992

Correct guessWrong guess

0 100 200 300 400 500 600 700246

Observations

(b) State estimation with Viterbi path for discrete data andGaussian duration distribution

Figure 4 Condition monitoring using the Viterbi path HSMMs can be effective to solve condition monitoring problems in time-dependentapplications due to their high accuracy in hidden state recognition

the testing casesThe results are summarized in Table 1(a) forthe continuous observations and in Table 1(b) for the discreteobservations The high percentage of correct classified statesshows that HSMMs can be effectively used to solve conditionmonitoring problems for applications in which the systemshows a strong time and state duration dependency

514 Remaining Useful Lifetime Estimation In this experi-mental phasewe considered the state 119878

5as the failure state and

the trained parameters 120582lowast of Section 512 for each HSMMconfiguration As the online RUL estimation procedure isintended to be used in real time we simulated conditionmonitoring experiment where we progressively consider theobservations x

1x2sdot sdot sdot x

119905 up to time 119905 When a new obser-

vation is acquired after the current state probability 120575119905(119894)

is estimated (Equation (48)) the calculation of the averageupper and lower RUL ((57) (58) and (59)) is performed

Examples of RUL estimation are illustrated in Figure 5In particular Figure 5(a) represents the case of continuousobservations and durationmodeled by aWeibull distributionwhile Figure 5(b) shows the case of discrete observationsand duration modeled by a Gamma distribution From thefigures one can notice that the average as well as the lowerand the upper bound estimations converges to the real RULas the failure time approaches Moreover as expected theuncertainty about the estimation decreases with the timesince predictions performed at an early stage are moreimprecise As a consequence the upper and the lower boundbecome more narrow as the failure state approaches and theestimation becomes more precise until it converges to theactual RUL value with the prediction error tending to zeroat the end of the evaluation

To quantitatively estimate the performance of our meth-odology for the RUL estimation we considered at each time119905 the absolute prediction error (APE) between the real RULand the predicted value defined as

APE (119905) = 1003816100381610038161003816RULreal (119905) minus RUL (119905)1003816100381610038161003816 (63)

where RULreal(119905) is the (known) value of the RUL at time119905 while RUL(119905) is RUL predicted by the model To evaluatethe overall performance of our methodology we consideredthe average absolute prediction error of the RUL estimationdefined as

APE =sum

119879

119905=1APE (119905)119879

(64)

where 119879 is the length of the testing signal APE being aprediction error average values of (64) close to zero corre-spond to good predictive performances

The results for each of the 10 testing cases and the differentHSMM configurations are reported in Tables 2(a) and 2(b)for the continuous and the discrete observation cases respec-tively As it can be noticed the online Remaining UsefulLifetime estimation and in general the online prediction ofthe time to a certain event can be effectively faced withHSMMs which achieve a reliable estimation power with asmall prediction error

Finally we tested our RUL estimationmethodology usingthe state duration estimation of (16) introduced byAzimi et al[30ndash32] The results are shown in Tables 3(a) and 3(b)in which respectively the prediction errors obtained forcontinuous and discrete observations are reported

Comparing Table 2 and Table 3 one can notice that theproposed RUL method outperforms the one of Azimi This

14 Mathematical Problems in Engineering

0 50 100 150 200 250 300 3500

50

100

150

200

250

300

350

400

Time

True RULUpper RUL

Average RULLower RUL

(a) Remaining Useful Lifetime estimation for continuous data andWeibull duration distribution

0 50 100 150 200 250 300 3500

50

100

150

200

250

300

350

400

Time

True RULUpper RUL

Average RULLower RUL

(b) Remaining Useful Lifetime estimation for discrete data andGamma duration distribution

Figure 5 HSMMs effectively solve RUL estimation problemsThe prediction converges to the actual RUL value and its uncertainty decreasesas the real failure time approaches

Table 1 State recognition accuracy

(a) Continuous observations

Test case Duration distributionGaussian Gamma Weibull

1 994 985 9922 997 986 9953 994 992 9974 989 989 9975 982 989 1006 991 988 9977 985 994 9978 992 991 9959 992 986 99710 992 991 995Average 991 989 996

(b) Discrete observations

Test case Duration distributionGaussian Gamma Weibull

1 974 967 9742 972 976 9653 994 958 9664 982 953 9775 991 974 9756 978 977 9787 958 972 9668 977 964 9729 989 972 98510 992 956 969Average 981 967 973

is mainly due to the proposed average state duration of (20)compared to the one of Azimi given by (16)

52 Real Data In this sectionwe apply the proposedHSMM-based approach for RUL estimation to a real case studyusing bearing monitoring data recorded during experimentscarried out on the Pronostia experimental platform andmadeavailable for the IEEE Prognostics and Health Management(PHM) 2012 challenge [49]The data correspond to normallydegraded bearings leading to cases which closely correspondto the industrial reality

The choice of testing the proposed methodology onbearings derives from two facts (i) bearings are the mostcritical components related to failures of rotating machines[50] and (ii) their monotonically increasing degradationpattern justifies the usage of left-right HSMMmodels

521 Data Description Pronostia is an experimental plat-form designed and realized at the Automatic Controland Micro-Mechatronic Systems (AS2M) Department ofFranche-Comte Electronics Mechanics Thermal Process-ing Optics-Science and Technology (FEMTO-ST) Institute(httpwwwfemto-stfr) (Besancon France) with the aimof collecting real data related to accelerated degradation ofbearings Such data are used to validate methods for bearingcondition assessment diagnostic and prognostic [19 51ndash59]

The Pronostia platform allows to perform run-to-failuretests under constant or variable operating conditions Theoperating conditions are determined by two factors thatcan be controlled online (i) the rotation speed and (ii)the radial force load During each experiment tempera-ture and vibration monitoring measurements are gatheredonline through two type of sensors placed in the bearing

Mathematical Problems in Engineering 15

Table 2 Average absolute prediction error (APE) of the RUL esti-mation using the proposed state duration estimator of (20)

(a) APE of the RUL estimation for the continuous observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 51 170 67 140 290 091 45 170 812 76 190 50 61 210 85 66 190 613 77 54 190 29 120 170 160 290 304 90 210 29 75 220 68 60 190 675 73 190 47 22 140 140 39 170 876 65 180 56 51 180 100 140 270 277 47 160 75 48 170 110 12 130 1208 100 220 29 52 180 100 92 220 399 31 92 140 20 160 130 82 210 4910 64 180 56 75 220 69 33 120 130Average 68 170 74 57 190 99 73 200 70

(b) APE of the RUL estimation for the discrete observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 21 110 140 31 88 140 24 120 1302 21 110 130 110 220 33 190 320 713 51 170 76 66 180 51 23 140 1104 59 65 180 52 170 67 42 160 905 32 140 100 83 190 34 120 240 296 120 240 27 62 180 52 41 84 1607 29 150 97 93 210 23 190 310 668 150 270 70 74 180 43 43 170 949 59 180 77 110 230 55 39 160 8810 35 110 140 55 60 160 52 170 71Average 57 150 100 74 170 66 77 190 90

housing a temperature probe and two accelerometers (oneon the vertical and one on the horizontal axis)

The platform is composed of three main parts a rotatingpart a load profile generation part and a measurement partas illustrated in Figure 6

The rotating part is composed of an asynchronous motorwhich develops a power equal to 250W two shafts and agearbox which allows the motor to reach its rated speed of2830 rpm The motorrsquos rotation speed and the direction areset through a human machine interface

The load profiles part issues a radial force on the externalring of the test bearing through a pneumatic jack connectedto a lever arm which indirectly transmits the load througha clamping ring The goal of the applied radial force is toaccelerate the bearingrsquos degradation process

Table 3 Average absolute prediction error (APE) of the RUL esti-mation using the state duration estimator of (16) introduced byAzimi et al [30ndash32]

(a) APE of the RUL estimation for the continuous observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 578 510 668 262 97 527 259 284 6462 502 444 577 213 170 469 290 192 7083 503 447 573 271 87 565 345 139 7344 518 460 604 213 143 459 349 171 7875 594 537 662 290 95 554 334 156 7496 580 517 671 258 83 541 231 258 6657 594 536 669 182 125 477 360 171 7448 634 556 723 194 157 441 348 178 7709 491 435 570 145 171 432 251 267 67010 544 484 628 232 79 527 241 245 674Average 554 493 635 226 121 499 301 206 715

(b) APE of the RUL estimation for the discrete observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 514 410 624 424 318 530 326 264 7362 496 399 604 595 483 708 313 276 6933 502 386 623 465 357 574 324 257 7024 422 315 538 501 405 606 237 361 6035 443 339 558 478 374 591 360 256 7656 522 432 627 552 443 669 272 316 6437 550 439 668 560 457 670 347 232 7448 503 390 620 604 505 710 351 264 7249 555 474 640 480 372 595 318 222 73610 490 382 607 521 412 631 294 289 687Average 500 397 611 518 413 629 314 274 704

Themeasurement part consists of a data acquisition cardconnected to the monitoring sensors which provides theuser with the measured temperature and vibration data Thevibration measurements are provided in snapshots of 01 scollected each 10 seconds at a sampling frequency of 256 kHz(2560 samples per each snapshot) while the temperature hasbeen continuously recorded at a sampling frequency of 10Hz(600 samples collected each minute)

Further details on the Pronostia test rig can be found onthe data presentation paper [49] and on the web page of thedata challenge (httpwwwfemto-stfrenResearch-depart-mentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-challengephp)

16 Mathematical Problems in Engineering

Table 4 Lifetime duration (in seconds) and operating conditions of the bearings tested in the IEEE PHM 2012 Prognostic Challenge dataset[49]

Condition 1 Condition 2 Condition 31800 rpm and 4000N 1650 rpm and 4200N 1500 rpm and 5000N

Bearing Lifetime [s] Bearing Lifetime [s] Bearing Lifetime [s]Bearing1 1 28030 Bearing2 1 9110 Bearing3 1 5150Bearing1 2 8710 Bearing2 2 7970 Bearing3 2 16370Bearing1 3 23750 Bearing2 3 19550 Bearing3 3 4340Bearing1 4 14280 Bearing2 4 7510Bearing1 5 24630 Bearing2 5 23110Bearing1 6 24480 Bearing2 6 7010Bearing1 7 22590 Bearing2 7 2300

Loadmodule

Testedbearing

Dataacquisitionmodule

Rotatingmodule

Figure 6 Global overview of the Pronostia experimental platform[19]

Regarding the data provided for the PHM 2012 challenge3 different operating conditions were considered

(i) first operating conditions speed of 1800 rpm and loadof 4000 Newton

(ii) second operating conditions speed of 1650 rpm andload of 4200 Newton

(iii) third operating conditions speed of 1500 rpm andload of 5000 Newton

Under the above operating conditions a total of 17 accel-erated life tests were realized on bearings of type NSK 6804DD which can operate at a maximum speed of 13000 rpmand a load limit of 4000N The tests were stopped whenthe amplitude of the vibration signal was higher than 20 gthus this moment was defined as the bearing failure time Anexample of bearing before and after the experiment is shownin Figure 7 together with the corresponding vibration signalcollected during the whole test

Table 4 reports how the 17 tested bearings were separatedinto the three operating conditions Moreover the durationof each experiment being the RUL to be predicted for eachbearing is also given We performed two sets of experimentsby considering respectively the bearings relative to thefirst and the second operating condition (ie Bearing1 1Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

As already mentioned the available data correspondto normally degraded bearings meaning that the defects

were not initially induced and that each degraded bearingcontains almost all the types of defects (balls rings and cage)resembling faithfully a common real industrial situationMoreover no assumption about the type of failure to beoccurred is provided with the data and since the variabilityin experiment durations is high (from 1 h to 7 h) performinggood estimates of the RUL is a difficult task [49]

In our experiments we considered as input to our modelthe horizontal channel of the accelerometerWe preprocessedthe raw signals by extracting two time-domain features thatis root mean square (RMS) and kurtosis within windowsof the same length as the given snapshots (119871 = 2560) Let119903119908(119905) be the raw signal of the 119908th window for each 119908 we

estimate RMS as 119909RMS119908

= radic(1119871)sum119871

119905=11199032119908(119905) and kurtosis as

119909KURT119908

= ((1119871)sum119871

119905=1(119903

119908(119905) minus 119903

119908)4)((1119871)sum

119871

119905=1(119903

119908(119905) minus 119903

119908)2)2

where 119903119908is the mean of 119903

119908 An example of feature extraction

for Bearing1 1 is shown in Figure 8To assess the performance of the proposed HSMM after

the model selection procedure we implemented a leave-one-out cross validation scheme by considering separatelyconditions 1 and 2 we performed the online RUL estimationfor each of the 7 bearings using an HSMM trained with theremaining 6 bearing histories Similarly to the simulated casewe considered the average absolute prediction error definedin (64) to quantitatively evaluate our method

522 Bearings RUL Estimation We performed our exper-iments in two steps firstly we applied model selection inorder to determine an optimalmodel structure and secondlywe estimated the RUL of the bearings The full procedure isdetailed in the following

(A) HSMM Structure To determine an appropriate HSMMstructure for effectively modeling the considered data weconsidered several HSMM structures characterized by (i)the duration distribution family (being Gaussian Gamma orWeibull) (ii) an increasing number of states119873 from 2 to 6and (iii) an increasing number of Gaussian mixtures 119872 inthe observation density from 1 to 4 For eachmodel structureobtained by systematically considering all the combinationsof (i) to (iii) we run 120 parameter learnings correspondingto 120 random initializations 1205820 on the data sets (Bearing1 1

Mathematical Problems in Engineering 17

times106

0 1 2 3 4 5 6 7

50

40

30

20

10

0

minus10

minus20

minus30

minus40

minus50

Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]

(b)(a)

0 280300

7

Extracted feature RMS

Time (s)

0 280300

60

Time (s)

Extracted feature kurtosis

Featureextraction

Win

dow

1

Win

dow

2

Win

dow

3

Win

dow

Win

dow

r(t)

x1 x2 x3

middot middot middot

middot middot middot

nminus1

n

xnminus1 xn

Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1

Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density

(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration

Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast

119894were estimated for the 119894th testing bear-

ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878

4as the failure state The same proce-

dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-

ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as

18 Mathematical Problems in Engineering

2 3 4 5 60

005

01

015

02

025

03

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(a) AIC values for Condition 1

2 3 4 5 60

01

02

03

04

05

06

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(b) AIC values for Condition 2

Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density

0 22590

22590

45000

Time (s)

True RULUpper RUL

Average RULLower RUL

(a) RUL estimation for Bearing1 7

True RULUpper RUL

Average RULLower RUL

0 70100

7010

19000

Time (s)

(b) RUL estimation for Bearing2 6

Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings

well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time

Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors

of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2

6 Conclusion and Future Work

In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation

The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required

Mathematical Problems in Engineering 19

Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds

(a) Condition 1

Test Bearings APEavg APElow APEup

Bearing1 1 105716 127230 94146Bearing1 2 43312 38156 38213Bearing1 3 29970 97309 60912Bearing1 4 63363 28766 148719Bearing1 5 19689 74484 104115Bearing1 6 42530 98964 97937Bearing1 7 13880 74943 100881Average 45494 77122 92132

(b) Condition 2

Test Bearings APEavg APElow APEup

Bearing2 1 24759 50065 72875Bearing2 2 16473 44972 82886Bearing2 3 88771 95083 79621Bearing2 4 17698 42486 49825Bearing2 5 86631 104900 107300Bearing2 6 8771 35047 66870Bearing2 7 30125 38664 66519Average 39033 58745 75128

parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology

This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications

As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension

Appendix

In this appendix we give the derivation of the state durationvariable introduced in (20) as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(119883

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A1)

The random variable 119889119905(119894) has been defined in Section 21

as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889

119905(119894) is sampled from an

arbitrary distribution

119889119905(119894) sim 119891 (119889) (A2)

We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as

P (119889119905(119894) = 119889) = P (119904

119905minus119889minus1= 119878

119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905 120582)

(A3)

We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889

119905(119894) of 119889

119905(119894) defined as its

expected value (see Equation (15))

119889119905(119894) = E (119889

119905(119894) | 119904

119905= 119878

119894 x

1x2sdot sdot sdot x

119905) 1 le 119894 le 119873 (A4)

From the definition of expectation we have

119889119905(119894) =

119905

sum

119889=1

119889 sdot P (119889119905(119894) = 119889)

=

119905

sum

119889=1

119889 sdot P (119904119905minus119889minus1

= 119878119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

(A5)

For 119889119905+1

(119894) we have

119889119905+1

(119894) =

119905+1

sum

119889=1

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119886)

(A6)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

(A7)

By noticing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

= (P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1))

sdot (P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1))

minus1

(A8)

20 Mathematical Problems in Engineering

we can replace the probability of the second termof (A7)with

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119887)

(A9)

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

(A10)

In the last factor of (A10) we can omit the information aboutthe current state and observation by observing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

asymp P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894| 119904

119905= 119878

119894 x

1 x

119905)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119888)

(A11)

if the following independencies hold

119904119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

119883119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

(A12)

wherewithperpwe denote independency Equation (A12) holdsforHMMs (evenwithout conditioning on x

1 x

119905) but they

do not hold for HSMMs since the state duration (expressedby 119904

119905minus119889+1 119904

119905minus1) determines the system evolution On

the other hand state duration is partially known by theobservtions x

1 x

119905Thus the approximation is reasonable

as long as the uncertainty on the states is within limitsFrom (A6) (A9) and (A11) we obtain

119889119905+1

(119894) = (119886) +

119905+1

sum

119889=2

119889 sdot (119887) sdot (119888)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

P(119860119861|119862)=P(119860|119861119862)sdotP(119861|119862)

+

119905+1

sum

119889=2

119889 sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

it does not depend on 119889

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

+ P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [

[

P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894119904

119905+1= 119878

119894 x

1 x

119905x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

for the approximation of (A11)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

]

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) +

119905

sum

1198891015840=1

(1198891015840+ 1)

sdot P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

(A13)

Noticing that

119905

sum

1198891015840=1

P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

+ P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) = 1

(A14)

because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows

119889119905+1

(119894) = P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1) sdot (119889

119905(119894) + 1)

(A15)

The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step

In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889

119905+1(119894)

we can consider the following equality

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=P (119904

119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

P (119904119905+1

= 119878119894| x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

120574119905+1

(119894)

(A16)

If we consider the terms involved in the probability at thenumerator of the right-hand side of (A16) we have that

x1 x

119905⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119861

perp x119905+1⏟⏟⏟⏟⏟⏟⏟

119862

| 119904119905= 119878

119894 119904

119905+1= 119878

119894⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119860

(A17)

Mathematical Problems in Engineering 21

If 119861 perp 119862 | 119860 for the Bayes rule we have that

P (119860 | 119862 119861) =P (119862 | 119860119861) sdot P (119860 | 119861)

P (119862 | 119861) (A18)

Hence we can rewrite the numerator of the right-hand sideof (A16) as follows

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

= (P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905)

sdot P(x119905+1

|

x119905+1

perp119904119905|119904119905+1

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞119904119905= 119878

119894 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

= (P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

sdot

120574119905(119894)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (119904

119905= 119878

119894| x

1 x

119905) sdot

119887119894(x119905+1

)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (x

119905+1| 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

(A19)

The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as

P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

= sum

119889119905

119886119894119894(d

119905) sdot P (119889

119905| x

1 x

119905)

asymp 119886119894119894(d

119905)

(A20)

while the denominator of (A19) can be expressed as follows

P (x119905+1

| x1 x

119905) =

P (x1 x

119905 x

119905+1)

P (x1 x

119905)

=sum

119873

119894=1120572119905+1

(119894)

sum119873

119894=1120572119905(119894)

(A21)

By substituting (A20) and (A21) in (A19) we obtain

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

sum119873

119894=1120572119905+1

(119894)

(A22)

and then by combining (A22) and (A16) we obtain

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

120574119905+1

(119894) sum119873

119894=1120572119905+1

(119894)

(A23)

Finally by substituting (A23) in (A15) and considering that

120574119905(119894) =

120572119905(119894)

sum119873

119894=1120572119905(119894)

(A24)

we derive the induction formula for 119889119905+1

(119894) in terms of modelparameters as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(x

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A25)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989

[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004

[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996

[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002

[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005

[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008

[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309

[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en

[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003

[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002

[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven

22 Mathematical Problems in Engineering

methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012

[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008

[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989

[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031

[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010

[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010

[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011

[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012

[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012

[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980

[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk

[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007

[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005

[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003

[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997

[26] S-Z Yu ldquoHidden semi-Markov modelsrdquo Artificial Intelligencevol 174 no 2 pp 215ndash243 2010

[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003

[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006

[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014

[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003

[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008

[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005

[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999

[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007

[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011

[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001

[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002

[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007

[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997

[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008

[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005

[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997

Mathematical Problems in Engineering 23

[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002

[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986

[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993

[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006

[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973

[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977

[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012

[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985

[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015

[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013

[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013

[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013

[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013

[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012

[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012

[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model

and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013

[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Hidden Semi-Markov Models for Predictive Maintenance

4 Mathematical Problems in Engineering

represented by a 119873 times 119873 matrix with the diagonal elementsequal to zero defined as

A0= [119886

0

119894119895] =

0 if 119894 = 119895

P (119904119905+1

= 119878119895| 119904

119905= 119878

119894) if 119894 = 119895

(7)

A0 must be specified as a stochastic matrix that is its ele-ments have to satisfy the constraintsum119873

119895=11198860

119894119895= 1 for all 119894

As a consequence of the above decomposition thedynamic of the underlying semi-Markov chain can be definedby specifying only the state-dependent duration parametersΘ and the nonrecurrentmatrixA0 since themodel transitionmatrix can be calculated at each time 119905 using (6) and (7)

Ad119905

= P (d119905) + (I minus P (d

119905))A0

(8)

where I is the identity matrix If we denote the elements ofthe dynamic transition matrix Ad

119905

as 119886119894119895(d

119905) the stochastic

constraint sum119873

119895=1119886119894119895(d

119905) = 1 for all 119894 and 119905 is guaranteed from

the fact that P(d119905) is a diagonal matrix and A0 is a stochastic

matrixFor several applications it is necessary to model the

absorbing state which in the case of industrial equipmentcorresponds to the ldquobrokenrdquo or ldquofailurerdquo state If we denote theabsorbing state as 119878

119896with 119896 isin [1119873] we must fix the 119896th row

of the nonrecurrentmatrixA0 to be 1198860119896119896= 1 and 1198860

119896119894= 0 for all

1 le 119894 le 119873 with 119894 = 119896 By substituting such A0 matrix in (8)it is easy to show that the element 119886

119896119896(d

119905) = 1 and remains

constant for all 119905 while the duration probability parameters120579119896are not influent for the absorbing state 119878

119896 An example of

absorbing state specification will be given in Section 5With respect to the input observation signals in this work

we consider both continuous and discrete data by adaptingthe suitable observationmodel depending on the observationnature In particular for the continuous case we modelthe observations with a multivariate mixture of Gaussiansdistributions This choice presents two main advantages (i)a multivariate model allows to deal with multiple observa-tions at the same time this is often the case of industrialequipments modeling since at each time multiple sensorsrsquomeasurements are available and (ii) mixture of Gaussians hasbeen proved to closely approximate any finite and continuousdensity function [33] Formally if we denote by x

119905the

observation vector at time 119905 and the generic observationvector beingmodeled as x the observation density for the 119895thstate is represented by a finite mixture of119872 gaussians

119887119895(x) =

119872

sum

119898=1

119888119895119898N (x120583

119895119898U

119895119898) 1 le 119895 le 119873 (9)

where 119888119895119898

is the mixture coefficient for the 119898th mixture instate 119878

119895 which satisfies the stochastic constraintsum119872

119898=1119888119895119898

= 1

for 1 le 119895 le 119873 and 119888119895119898

ge 0 for 1 le 119895 le 119873 and 1 le 119898 le 119872while N is the Gaussian density with mean vector 120583

119895119898and

covariance matrix U119895119898

for the 119898th mixture component instate 119895

In case of discrete data we model the observationswithin each state with a nonparametric discrete probability

distribution In particular if 119871 is the number of distinctobservation symbols per state and if we denote the symbolsas 119883 = 119883

1 119883

119871 and the observation at time 119905 as 119909

119905 the

observation symbol probability distribution can be defined asa matrix 119861 = [119887

119895(119897)] of dimensions119873 times 119871 where

119887119895(119897) = P [119909

119905= 119883

119897| 119904

119905= 119878

119895] 1 le 119895 le 119873 1 le 119897 le 119871 (10)

Since the system in each state at each time step can emit oneof the possible 119871 symbols the matrix 119861 is stochastic that is itis constrained to sum119871

119897=1119887119895(119897) = 1 for all 1 le 119895 le 119873

Finally as in the case of HMMs we specify the initial statedistribution 120587 = 120587

119894 which defines the probability of the

starting state as

120587119894= P [119904

1= 119878

119894] 1 le 119894 le 119873 (11)

From the above considerations two different HSMMmodels can be considered In the case of continuous obser-vation 120582 = (A0

Θ 119862 120583 119880 120587) and in the case of discreteobservation the HSMM is characterized by 120582 = (A0

Θ 119861 120587)An example of continuous HSMM with 3 states is shown inFigure 1

22 Learning and Inference Algorithms Let us denote thegeneric sequence of observations being indiscriminatelycontinuous vectors or discrete symbols as x = x

1x2sdot sdot sdot x

119879 in

order to use the defined HSMM model in practice similarlyto the HMM we need to solve three basic problems

(1) Given the observation x and a model 120582 calculate theprobability that the sequence x has been generated bythe model 120582 that is P(x | 120582)

(2) Given the observation x and a model 120582 calculatethe state sequence 119878 = 119904

11199042sdot sdot sdot 119904

119879which have most

probably generated the sequence x(3) Given the observation x find the parameters of the

model 120582 which maximize P(x | 120582)

As in case of HMM solving the above problems requiresusing the forward-backward [13] decoding (Viterbi [46] andForney [47]) and ExpectationMaximization [48] algorithmswhichwill be adapted to theHSMMintroduced in Section 21In the following we also propose a more effective estimatorof the state duration variable 119889

119905(119894) defined in (2)

221 The Forward-Backward Algorithm Given a genericsequence of observations x = x

1x2sdot sdot sdot x

119879 the goal is to

calculate the model likelihood that isP(x | 120582) This quantityis useful for the training procedure where the parametersthat locally maximize the model likelihood are chosen aswell as for classification problems The latter is the case inwhich the observation sequence x has to be mapped to oneof a finite set of 119862 classes represented by a set of HSMMparameters 119871 = 120582

1 120582

119862 The class of x is chosen such

that 120582(x) = argmax120582isin119871

P(119883 | 120582)To calculate the model likelihood we first define the

forward variable at each time 119905 as

120572119905(119894) = P (x

1x2sdot sdot sdot x

119905 119904

119905= 119878

119894| 120582) 1 le 119894 le 119873 (12)

Mathematical Problems in Engineering 5

Hiddenstates

Observationprobabilities

Sojournprobabilities

Time (u) Time (u) Time (u)

Observed Observed Observed

S1 S2 S3

a12 a23

P(oS1) P(oS2) P(oS3)

d3(u)d2(u)d1(u)

Figure 1 Graphical representation of an HSMM

Contrarily toHMMs forHSMMs the state durationmustbe taken into account in the the forward variable calculationConsequently Yu [26] proposed the following inductiveformula

120572119905(119895 119889) =

119863

sum

1198891015840=1

119873

sum

119894=1

(120572119905minus1198891015840 (119894 119889

1015840) 119886

0

119894119895119901119895119895(119889

1015840)

119905

prod

119896=119905minus119889+1

119887119895(x

119896))

1 le 119895 le 119873 1 le 119905 le 119879

(13)

that is the sum of the probabilities of being in the currentstate 119878

119895(at time 119905) for the past 1198891015840 time units (with 1 le 119889

1015840le 119863

and119863 themaximumallowed duration for each state) comingfrom all the possible previous states 119878

119894 1 le 119894 le 119873 and 119894 = 119895

The disadvantage of the above formulation is that asdiscussed in Introduction the specification of the maximumduration 119863 represents a limitation to the modeling general-ization Moreover from (13) it is clear that the computationand memory complexities drastically increase with119863 whichcan be very large inmany applications in particular for onlinefailure prediction

To alleviate this problem Azimi et al [30ndash32] introduceda new forward algorithm for HSMMs that by keeping trackof the estimated average state duration at each iterationhas a computational complexity comparable to the forwardalgorithm for HMMs [13] However the average state dura-tion represents an approximation Consequently the forwardalgorithm of Azimi compared with (13) pays the priceof a lower precision in favor of a (indispensable) bettercomputational efficiency

To calculate the forward variable 120572119905(119895) using Azimirsquos

approach the duration-dependent transition matrix defined

in (8) is taken in consideration in the induction formula of(13) which becomes [30]

120572119905(119895) = [

119873

sum

119894=1

120572119905minus1

(119894) 119886119894119895(d

119905minus1)] 119887

119895(x

119905) (14)

To calculate the above formula the average state duration of(2)must be estimated for each time 119905 bymeans of the variabled119905= [119889

119905(119894)] defined as

119889119905(119894) = E (119889

119905(119894) | x

1x2sdot sdot sdot x

119905 119904

119905= 119878

119894 120582) 1 le 119894 le 119873 (15)

where E denotes the expected value To calculate the abovequantity Azimi et al [30ndash32] use the following formula

d119905= 120574

119905minus1⊙ d

119905minus1+ 1 (16)

where ⊙ represents the element by element product betweentwo matricesvectors and the vector 120574t = [120574

119905(119894)] (the

probability of being in state 119878119894at time 119905 given the observation

sequence and the model parameters) with dimensions119873 times 1

is calculated in terms of 120572119905(119894) as

120574119905(119894) = P (119904

119905= 119878

119894| x

1x2sdot sdot sdot x

119905 120582) =

120572119905(119894)

sum119873

119895=1120572119905(119895)

1 le 119894 le 119873

(17)

Equation (16) is based on the following induction formula[30ndash32] that rules the dynamics of the duration vector whenthe systemrsquos state is known

119889119905(119894) = 119904

119905minus1(119894) sdot 119889

119905minus1(119894) + 1 (18)

where for each 119905 119904119905(119894) is 1 if 119904

119905= 119878

119894 0 otherwise

6 Mathematical Problems in Engineering

A simple example shows that (18) is incorrect assum-ing an HSMM with three states and considering the statesequence (119878

1 119878

1 119878

2 ) the correct sequence of the duration

vector is d1= [1 1 1]

119879 d2= [2 1 1]

119879 and d3= [1 1 1]

119879where the superscript 119879 denotes vector transpose If we apply(18) we obtain d

1= [1 1 1]

119879 d2= [2 1 1]

119879 and d3=

[1 2 1]119879 which is in contradiction with the definition of the

state duration vector given in (2)To calculate the average state duration variable 119889

119905(119894) we

propose a new induction formula that estimates for each time119905 the time spent in the 119894th state prior to 119905 as

119889119905(119894) = P (119904

119905minus1= 119878

119894| 119904

119905= 119878

119894 x

1 x

119905) sdot (119889

119905minus1(119894) + 1) (19)

=119886119894119894(d

119905minus1) sdot 120572

119905minus1(119894) sdot 119887

119894(x

119905)

120572119905(119894)

sdot (119889119905minus1

(119894) + 1)

1 le 119894 le 119873

(20)

The derivation of (20) is given in Appendix The intuitionbehind (19) is that the current average duration is the previousaverage duration plus one weighted with the ldquoamountrdquo of thecurrent state that was already in state 119878

119894in the previous step

Using the proposed (20) the forward algorithm can bespecified as follows

(1) initialization with 1 le 119894 le 119873

1205721(119894) = 120587

119894119887119894(x

1)

1198891(119894) = 1

Ad1

= P (d1) + (I minus P (d

1))A0

(21)

where P(d119894) is estimated using (6)

(2) induction with 1 le 119895 le 119873 and 1 le 119905 le 119879 minus 1

120572119905+1

(119895) = [

119873

sum

119894=1

120572119905(119894) 119886

119894119895(d

119905)] 119887

119895(x

119905+1) (22)

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(x

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (23)

Ad119905+1

= P (d119905+1

) + (I minus P (d119905+1

))A0 (24)

where 119886119894119895(d

119905) are the coefficients of the matrix Ad

119905

(3) termination

P (x | 120582) =119873

sum

119894=1

120572119879(119894) (25)

Similar considerations as the forward procedure can bemade for the backward algorithm which is implemented bydefining the variable 120573

119905(119894) as

120573119905(119894) = P (x

119905+1x119905+2

sdot sdot sdot x119879| 119904

119905= 119878

119894 120582) 1 le 119894 le 119873 (26)

Having estimated the dynamic transition matrix Ad119905

foreach 1 le 119905 le 119879 using (24) the backward variable can becalculated inductively as follows

(1) Initialization

120573119879(119894) = 1 1 le 119894 le 119873 (27)

(2) Induction

120573119905(119894) =

119873

sum

119895=1

119886119894119895(d

119905) 119887

119895(x

119905+1) 120573

119905+1(119895)

119905 = 119879 minus 1 119879 minus 2 1 1 le 119894 le 119873

(28)

Although the variable 120573119905(119894) is not necessary for the

calculation of the model likelihood it will be useful in theparameter reestimation procedure as it will be explained inSection 223

222 The Viterbi Algorithm The Viterbi algorithm [46 47](also known as decoding) allows determining the best statesequence corresponding to a given observation sequence

Formally given a sequence of observation x = x1x2sdot sdot sdot x

119879

the best state sequence 119878lowast = 119904lowast

1119904lowast

2sdot sdot sdot 119904

lowast

119879corresponding to x is

calculated by defining the variable 120575119905(119894) as

120575119905(119894) = max

11990411199042119904119905minus1

P (11990411199042 119904

119905= 119878

119894 x

1x2sdot sdot sdot x

119905| 120582) (29)

The procedure to recursively calculate the variable 120575119905(119894)

and to retrieve the target state sequence (ie the argumentswhich maximize the 120575

119905(119894)rsquos) for the proposed HSMM is a

straightforward extension of theViterbi algorithm forHMMs[13]The only change is the usage in the recursive calculationof 120575

119905(119894) of the dynamic transition matrix Ad

119905

= [119886119894119895(d

119905)]

calculated through (24) The Viterbi algorithm for the intro-duced parametric HSMMs can be summarized as follows

(1) initialization with 1 le 119894 le 119873

1205751(119894) = 120587

119894119887119894(x

1)

1205951(119894) = 0

(30)

(2) recursion with 1 le 119895 le 119873 and 2 le 119905 le 119879

120575119905(119895) = max

1le119894le119873

[120575119905minus1

(119894) 119886119894119895(d

119905)] 119887

119895(x

119905) (31)

120595119905(119895) = argmax

1le119894le119873

[120575119905minus1

(119894) 119886119894119895(d

119905)] (32)

(3) termination

119875lowast= max

1le119894le119873

[120575119879(119894)] (33)

119904lowast

119879= argmax

1le119894le119873

[120575119879(119894)] (34)

where we keep track of the argument maximizing (31) usingthe vector 120595

119905 which tracked back gives the desired best

state sequence

119904lowast

119905= 120595

119905+1(119904

lowast

119905+1) 119905 = 119879 minus 1 119879 minus 2 1 (35)

Mathematical Problems in Engineering 7

223 The Training Algorithm The training algorithm con-sists of estimating themodel parameters from the observationdata As discussed in Section 21 a parametric HSMMis defined by 120582 = (A0

Θ 119862 120583 119880 120587) if the observationsare continuous or 120582 = (A0

Θ 119861 120587) if the observationsare discrete Given a generic observation sequence x =

x1x2sdot sdot sdot x

119879 referred to as training set in the following the

training procedure consists of finding the model parameterset120582lowast which locallymaximizes themodel likelihoodP(x | 120582)

We use the modified Baum-Welch algorithm of Azimiet al [30ndash32] However in our implementation we do notmake assumption on the density function used to model thestate duration and we consider both continuous and discreteobservations

Being a variant of the more general Expectation-Max-imization (EM) algorithm Baum-Welch is an iterative proce-dure which consists of two steps (i) the expectation step inwhich the forward and backward variables are calculated andthe model likelihood is estimated and (ii) the maximizationstep in which the model parameters are updated and usedin the next iteration This process usually starts from arandom guess of the model parameters 1205820 and it is iterateduntil the likelihood function does not improve between twoconsecutive iterations

Similarly to HMMs the reestimation formulas arederived by firstly introducing the variable 120585

119905(119894 119895) which

represents the probability of being in state 119878119894at time 119905 and

in state 119878119895at time 119905 + 1 given the model and the observation

sequence as

120585119905(119894 119895) = P (119904

119905= 119878

119894 119904

119905+1= 119878

119895| x 120582) (36)

However in the HSMM case the variable 120585119905(119894 119895) considers

the duration estimation performed in the forward algorithm(see Equation (24)) Formulated in terms of the forward andbackward variables it is given by

120585119905(119894 119895) = P (119904

119905= 119878

119894 119904

119905+1= 119878

119895| x 120582)

=P (119904

119905= 119878

119894 119904

119905+1= 119878

119895 x | 120582)

P (x | 120582)

=120572119905(119894) 119886

119894119895(d

119905) 119887

119895(x

119905+1) 120573

119905+1(119895)

P (x | 120582)

=120572119905(119894) 119886

119894119895(d

119905) 119887

119895(x

119905+1) 120573

119905+1(119895)

sum119873

119894=1sum

119873

119895=1120572119905(119894) 119886

119894119895(d

119905) 119887

119895(x

119905+1) 120573

119905+1(119895)

(37)

From 120585119905(119894 119895) we can derive the quantity 120574

119905(119894) (already

defined in (17)) representing the probability of being in state119878119894at time 119905 given the observation sequence and the model

parameters

120574119905(119894) =

119873

sum

119895=1

120585119905(119894 119895) (38)

Finally the the reestimation formulas for the parameters120587 and A0 are given by

120587119894= 120574

1(119894) (39)

1198860

119894119895=

(sum119879minus1

119905=1120585119905(119894 119895)) ⊙ 119866

sum119873

119895=1(sum

119879minus1

119905=1120585119905(119894 119895)) ⊙ 119866

(40)

where119866 = [119892119894119895] is a squarematrix of dimensions119873times119873where

119892119894119895= 0 for 119894 = 119895 and 119892

119894119895= 1 for 119894 = 119895 ⊙ represents the element

by element product between two matrices sum119879minus1

119905=1120574119905(119894) is the

expected number of transitions from state 119878119894 andsum119879minus1

119905=1120585119905(119894 119895)

is the expected number of transitions from state 119878119894to state 119878

119895

Equation (39) represents the expected number of timesthat the model starts in state 119878

119894 while (40) represents the

expected number of transitions from state 119878119894to state 119878

119895with

119894 = 119895 over the total expected number of transitions from state119878119894to any other state different from 119878

119894

For the matrix A0 being normalized the stochasticconstraints are satisfied at each iteration that is sum119873

119895=11198860

119894119895= 1

for each 1 le 119894 le 119873 while the estimation of the priorprobability 120587

119894inherently sums up to 1 at each iteration since

it represents the expected frequency in state 119878119894at time 119905 = 1

for each 1 le 119894 le 119873With respect to the reestimation of the state duration

parameters Θ firstly we estimate the mean 120583119894119889

and thevariance 1205902

119894119889of the 119894th state duration for each 1 le 119894 le 119873

from the forward and backward variables and the estimationof the state duration variable

120583119894119889

=sum

119879minus1

119905=1120572119905(119894) (sum

119873

119895=1119895 =119894119886119894119895(119889

119905(119894)) 119887

119895(x

119905+1) 120573

119905+1(119895)) 119889

119905(119894)

sum119879minus1

119905=1120572119905(119894) (sum

119873

119895=1119895 =119894119886119894119895(119889

119905(119894)) 119887

119895(x

119905+1) 120573

119905+1(119895))

(41)

1205902

119894119889= (

119879minus1

sum

119905=1

120572119905(119894)(

119873

sum

119895=1119895 =119894

119886119894119895(119889

119905(119894)) 119887

119895(x

119905+1) 120573

119905+1(119895))

sdot (119889119905(119894) minus 120583

119894119889)2

)

sdot (

119879minus1

sum

119905=1

120572119905(119894)(

119873

sum

119895=1119895 =119894

119886119894119895(119889

119905(119894)) 119887

119895(x

119905+1) 120573

119905+1(119895)))

minus1

(42)

where (41) can be interpreted as the probability of transitionfrom state 119878

119894to 119878

119895with 119894 = 119895 at time 119905weighted by the duration

of state 119878119894at 119905 giving the desired expected value while in (42)

the same quantity is weighted by the squared distance of theduration at time 119905 from its mean giving the estimation of thevariance

Then the parameters of the desired duration distributioncan be estimated from 120583

119894119889and 1205902

119894119889 For example if a Gamma

distribution with shape parameter ] and scale parameter 120578 ischosen to model the state duration the parameters ]

119894and 120578

119894

for each 1 le 119894 le 119873 can be calculated as ]119894= 120583

2

119894119889120590

2

119894119889and

120578119894= 120590

2

119894119889120583

119894119889

8 Mathematical Problems in Engineering

Concerning the observation parameters once the modi-fied forward and backward variables accounting for the stateduration are defined as in (22) and (28) the reestimationformulas are the same as for Hidden Markov Models [13]

In particular for continuous observations the parametersof the Gaussiansrsquo mixture defined in (9) are reestimated byfirstly defining the probability of being in state 119878

119895at time 119905

with the probability of the observation vector x119905evaluated by

the 119896th mixture component as

120574119905(119895 119896) = [

[

120572119905(119895) 120573

119905(119895)

sum119873

119895=1120572119905(119895) 120573

119905(119895)

]

]

sdot [

[

119888119895119896N (x

119905120583

119895119896U

119895119896)

sum119872

119898=1119888119895119898N (x

119905120583

119895119898U

119895119898)

]

]

(43)

By using the former quantity the parameters 119888119895119896 120583

119895119896 andU

119895119896

are reestimated through the following formulas

119888119895119896=

sum119879

119905=1120574119905(119895 119896)

sum119879

119905=1sum

119872

119898=1120574119905(119895 119898)

120583119895119896=sum

119879

119905=1120574119905(119895 119896) sdot x

119905

sum119879

119905=1120574119905(119895 119896)

U119895119896=sum

119879

119905=1120574119905(119895 119896) sdot (x

119905minus 120583

119895119896) (x

119905minus 120583

119895119896)119879

sum119879

119905=1120574119905(119895 119896)

(44)

where superscript 119879 denotes vector transposeFor discrete observations the reestimation formula for

the observation matrix 119887119895(119897) is

119887119895(119897) =

sum119879

119905=1with119909119905=119883119897

120574119905(119895)

sum119879

119905=1120574119905(119895)

(45)

where the quantity 120574119905(119895) which takes into account the dura-

tion dependent forward variable 120572119905(119895) is calculated through

(17)The reader is referred to Rabinerrsquos work [13] for the inter-

pretation on the observation parameters reestimation formu-las

3 AIC-Based Model Selection

In the framework of the proposed parametric HSMMs themodel selection procedure aims to select the optimal numberof hidden states119873 the right duration distribution family andin the case of mixture observation modeling the number ofGaussian mixtures 119872 to be used In this work we make useof the Akaike Information Criterion (AIC) Indeed it hasbeen seen that in case of complex models and in presence ofa limited number of training observations AIC represents asatisfactorymethodology formodel selection outperformingother approaches like Bayesian Information Criterion

In general information criteria are represented as a two-term structure They account for a compromise between

a measure of model fitness which is based on the likelihoodof the model and a penalty term which takes into accountthe model complexity Usually the model complexity ismeasured in terms of the number of parameters that have tobe estimated and in terms of the number of observations

The Akaike Information Criterion is an estimate of theasymptotic value of the expected distance between theunknown true likelihood function of the data and the fittedlikelihood function of the model In particular the AIC canbe expressed as

AIC =minus log 119871 () + 119901

119879 (46)

where 119871() is the likelihood of the model with the estimatedparameters as defined in (25) 119901 is the number of modelparameters and119879 is the length of the observed sequenceThebest model is the one minimizing equation (46)

Concerning 119901 the number of parameters to be estimatedfor a parametric HSMM with119873 states is 119901 = 119901

ℎ+ 119901

119900 where

119901ℎare the parameters of the hidden states layer while 119901

119900are

those of the observation layerIn particular 119901

ℎ= (119873 minus 1) + (119873 minus 1) sdot 119873 + 119911 sdot 119873 where

(i) 119873 minus 1 accounts for the prior probabilities 120587(ii) (119873 minus 1) sdot 119873 accounts for the nonrecurrent transition

matrix A0(iii) 119911sdot119873 accounts for the duration probability being 119911 the

number of parameters 120579 of the duration distribution

Concerning 119901119900 a distinction must be made between

discrete and continuous observations

(i) in the case of discrete observations with 119871 possibleobservable values 119901

119900= (119871 minus 1) sdot 119873 which accounts

for the elements of the observation matrix 119861(ii) if the observations are continuous and a multivariate

mixture of 119872 Gaussians with 119874 variates is used asobservationmodel 119901

119900= [119874 sdot119873sdot119872]+[119874 sdot119874 sdot119873sdot119872]+

[(119872minus 1) sdot 119873] where each term accounts respectivelyfor the mean vector 120583 the covariance matrix 119880 andthe mixture coefficients 119862

4 Remaining Useful Lifetime Estimation

One of the most important advantages of the time modelingof HSMMs is the possibility to effectively face the predictionproblem The knowledge of the state duration distributionsallows the estimation of the remaining time in a certain stateand in general the prediction of the expected time119863 beforeentering in a determinate state

As already mentioned an interesting application of theprediction problem is the Remaining Useful Lifetime (RUL)estimation of industrial equipments Indeed if each stateof an HSMM is mapped to a different condition of anindustrial machine and if the state 119878

119896that represents the

failure condition is identified at each moment the RUL canbe defined as the expected time 119863 to reach the failure state

Mathematical Problems in Engineering 9

119878119896 If we assume that the time to failure is a random variable

119863 following a determinate probability density we define theRUL at the current time 119905 as

RUL119905= 119863 = E (119863) 119904

119905+= 119878

119896 119904

119905+minus1= 119878

119894

1 le 119894 119896 le 119873 119894 = 119896

(47)

where E denotes the expected valueHaving fixed the failure state the estimation of the RUL

is performed in two steps every time a new observation isacquired (online)

(1) estimation of the current state(2) projection of the future state transitions until the

failure state is reached and estimation of the expectedsojourn time

The estimation of the current state is performed via theViterbi path that is the variable 120575

119905= [120575

119905(119894)]

1le119894le119873defined in

(29) To correctly model the uncertainty of the current stateestimation we use the normalized variable 120575

119905(119894) obtained as

120575119905(119894) = max

11990411199042119904119905minus1

P (119904119905= 119878

119894| 119904

11199042sdot sdot sdot 119904

119905minus1 x

1x2sdot sdot sdot x

119905 120582)

=120575119905(119894)

sum119873

119895=1120575119905(119895)

1 le 119894 le 119873

(48)

that is an estimate of the probability of being in state 119878119894at

time 119905Together with the normalized variable 120575

119905(119894) the maxi-

mum a posteriori estimate of the current state 119904lowast

119905is taken

into account according to (34) If 119904lowast119905coincides with the failure

state the desired event is detected by the model and the timeto this event is obviously zero Otherwise an estimation ofthe average remaining time in the current state 119889avg(119904

lowast

119905) is

calculated as

119889avg (119904lowast

119905) =

119873

sum

119894=1

(120583119889119894

minus 119889119905(119894)) ⊙ 120575

119905(119894) (49)

where with 120583119889119894

we denote the expected value of the durationvariable in state 119878

119894according to the duration distribution

specified by the parameters 120579119894 Equation (49) thus estimates

the remaining time in the current state by subtracting theestimated states duration 119889

119905(119894) at time 119905 from the expected

sojourn time of state 119878119894 and weighting the result using the

uncertainty about the current state 120575119905(119894) and finally by

summing up all the contributions from each stateIn addition to the average remaining time a lower and an

upper bound value can be calculated based on the standarddeviation 120590

119889119894

of the duration distribution for state 119878119894

119889low (119904lowast

119905) =

119873

sum

119894=1

(120583119889119894

minus 120590119889119894

minus 119889119905(119894)) ⊙ 120575

119905(119894) (50)

119889up (119904lowast

119905) =

119873

sum

119894=1

(120583119889119894

+ 120590119889119894

minus 119889119905(119894)) ⊙ 120575

119905(119894) (51)

Once the remaining time in the current state is estimatedthe probability of the next state is calculated by multiplyingthe transpose of the nonrecurrent transition matrix by thecurrent state probability estimation as follows

120575next = [120575119905+119889

(119894)]1le119894le119873

= (A0)119879

sdot 120575119905 (52)

while themaximumaposteriori estimate of the next state 119904lowastnextis calculated as

119904lowast

next = 119904lowast

119905+119889= argmax

1le119894le119873

120575119905+119889

(119894) (53)

Again if 119904lowast119905+119889

coincides with the failure state the failurewill happen after the remaining time at the current state isover and the average estimation of the failure time is 119863avg =

119889avg(119904lowast

119905) calculated at the previous step with the bound values

119863low = 119889low(119904lowast

119905) and119863up = 119889up(119904

lowast

119905) Otherwise the estimation

of the sojourn time of the next state is calculated as follows

119889avg (119904lowast

119905+119889) =

119873

sum

119894=1

120583119889119894

⊙ 120575119905+119889

(119894) (54)

119889low (119904lowast

119905+119889) =

119873

sum

119894=1

(120583119889119894

minus 120590119889119894

) ⊙ 120575119905+119889

(119894) (55)

119889up (119904lowast

119905+119889) =

119873

sum

119894=1

(120583119889119894

+ 120590119889119894

) ⊙ 120575119905+119889

(119894) (56)

This procedure is repeated until the failure state isencountered in the prediction of the next state The calcula-tion of the RUL is then simply obtained by summing all theestimated remaining time in each intermediate state beforeencountering the failure state

119863avg = sum119889avg (57)

119863low = sum119889low (58)

119863up = sum119889up (59)

Finally Algorithm 1 details the above described RUL esti-mation procedure

5 Experimental Results

To demonstrate the effectiveness of the proposed HSMMmodels we make use of a series of experiments performedboth on simulated and real data

The simulated data were generated by considering a left-right HSMM and adapting the parameters of the artificialexample reported in the work of Lee et al [15] The real casedata are monitoring data related to the entire operationallife of bearings made available for the IEEE PHM 2012 datachallenge (httpwwwfemto-stfrenResearch-departmentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-chal-lengephp)

10 Mathematical Problems in Engineering

(1) function RulEstimation(x119905 119878

119896) ⊳ x

119905 The last observation acquired

(2) ⊳ 119878119896 The failure state

(3) Initialization(4) 119863avg larr 0

(5) 119863low larr 0

(6) 119863up larr 0

(7) Current state estimation(8) Calculate 120575

119905⊳ Using (48)

(9) Calculate 119904lowast119905

⊳ Using (34)(10) Calculate d

119905⊳ Using (20)

(11) 119878 larr 119904lowast

119905

(12) Loop(13) while 119878 = 119878

119896do

(14) Calculate 119889avg ⊳ Using (49) or (54)(15) Calculate 119889low ⊳ Using (50) or (55)(16) Calculate 119889up ⊳ Using (51) or (56)(17) 119863avg larr 119863avg + 119889avg

(18) 119863low larr 119863low + 119889low(19) 119863up larr 119863up + 119889up(20) Calculate 120575next ⊳ Using (52)(21) Calculate 119904lowastnext ⊳ Using (53)(22) 119878 larr 119904

lowast

nextend

(23) return 119863avg119863low119863up

Algorithm 1 Remaining Useful Lifetime estimation (Pseudo-Code)

51 Simulated Experiment Data have been generated withthe idea of simulating the behavior of an industrial machinethat during its functioning experiences several degradationmodalities until a failure state is reached at the end of itslifetimeThe generated data are used to test the performancesof our methodology for (i) automatic model selection (ii)online condition monitoring and (iii) Remaining UsefulLifetime estimation considering both continuous and dis-crete observations

511 Data Generation The industrial machine subject ofthese experiments has been modeled as a left-right paramet-ric HSMM with 119873 = 5 states having state 119878

5as absorbing

(failure) stateThe choice of a left-right setting has beenmadefor simplicity reasons since the primary goal of this work is todemonstrate that the proposed model specification coupledwith the Akaike Information Criterion is effective to solveautomatic model selection online condition monitoringand prediction problems At this purpose we divided theexperiments in two cases according to the nature of theobservation being continuous or discrete

For each of the continuous and the discrete cases threedata sets have been generated by considering the followingduration models Gaussian Gamma and Weibull densitiesFor each of the three data sets 30 series of data are usedas training set and 10 series as testing set Each time seriescontains 119879 = 650 observations The parameters used togenerate the data are taken from the work of Lee et al [15]

and are adapted to obtain an equivalent left-right parametricHSMM as follows

120587 =

[[[[[

[

1

0

0

0

0

]]]]]

]

A0=

[[[[[

[

0 1 0 0 0

0 0 1 0 0

0 0 0 1 0

0 0 0 0 1

0 0 0 0 1

]]]]]

]

ΘN = 1205791= [100 20] 120579

2= [90 15] 120579

3= [100 20]

1205794= [80 25] 120579

5= [200 1]

ΘG = 1205791= [500 02] 120579

2= [540 01667]

1205793= [500 02] 120579

4= [256 03125]

1205795= [800 0005]

ΘW = 1205791= [102 28] 120579

2= [92 29] 120579

3= [102 28]

1205794= [82 20] 120579

5= [200 256]

(60)

whereΘNΘG andΘW are the different distribution param-eters used for the Gaussian Gamma and Weibull dura-tion models respectively More precisely they represent thevalues of the mean 120583

119889and the the variance 120590

2

119889of the

Gaussian distribution the shape ]119889and the scale 120578

119889of the

Gamma distribution and the scale 119886119889and the shape 119887

119889of

Mathematical Problems in Engineering 11

0 100 200 300 400 500 600

12345

Hidden states sequence

0 100 200 300 400 500 6000

100

200

300 State duration

0 100 200 300 400 500 6000

20

40

60Observed signal

(a) Example of simulated data for the continuous case

0 100 200 300 400 500 600

12345

Hidden states sequence

0 100 200 300 400 500 6000

100

200

300 State duration

0 100 200 300 400 500 600

132

54

76

Observed symbols

(b) Example of simulated data for the discrete case

Figure 2 The data generated with the parameter described in Section 511 both for the continuous and the discrete case

theWeibull distribution It must be noticed that as explainedin Section 21 for state 119878

5 being the absorbing state the

duration parameters 1205795have no influence on the data since

once the state 1198785is reached the system will remain there

foreverConcerning the continuous observation modeling a

bivariate Gaussian distribution has been used with the fol-lowing parameters [15]

1205831= [

20

20] 120583

2= [

20

35] 120583

3= [

35

35]

1205835= [

28

28]

1198801= [

20 0

0 20] 119880

2= [

15 0

0 15] 119880

3= [

15 minus2

minus2 15]

1198804= [

5 0

0 5] 119880

5= [

10 3

3 10]

(61)

while for the discrete case 119871 = 7 distinct observation sym-bols have been taken into consideration with the followingobservation probability distribution

119861 =

[[[[[

[

08 02 0 0 0 0 0

01 08 01 0 0 0 0

0 01 08 01 0 0 0

0 0 01 07 01 01 0

0 0 0 02 06 01 01

]]]]]

]

(62)

An example of simulated data both for the continuous and thediscrete cases is shown in Figure 2 where aGaussian durationmodel has been used

512 Training and Model Selection The goal of this experi-mental phase is to test the effectiveness of the AIC in solvingthe automatic model selection For this purpose the trainingsets of the 6 data sets (continuousdiscrte observation withGaussian Gamma and Weibull duration models) have beentaken individually and for each one of them a series of

learning procedure has been run each one with a variableHSMM structure In particular we took into account all thecombinations of the duration distribution families (GaussianGamma and Weibull) an increasing number of states 119873from 2 to 8 and for the continuous observation cases anincreasing number of Gaussian mixtures119872 in the observa-tion distribution from 1 to 4

As accurate parameter initialization is crucial for obtain-ing a good model fitting [14] a series of 40 learning proce-dures corresponding to 40 random initializations of the initialparameters 1205820 have been executed for each of the consideredHSMM structures For each model structure the AIC valueas defined in (46) has been evaluatedThe final trained set ofparameters 120582lowast corresponding to the minimumAIC value hasbeen retained resulting in 7 HSMMs with a number of statesfrom 2 to 8

The obtained results are shown in Figure 3 for boththe continuous and discrete observation data As it can benoticed for all the 6 test cases of Figure 3 the AIC valuesdo not improve much for a number of states higher than 5meaning that adding more states does not add considerableinformation to the HSMM modeling power Hence 5 statescan be considered as an optimal number of states Moreoveras shown in the zoomed sections of Figure 3 for the HSMMswith 5 states the minimum AIC values are obtained for theduration distributions corresponding to the ones used togenerate the data As a consequence AIC can be considered asan effective approach to performmodel selection for HSMMas well as selecting the appropriate parametric distributionfamily for the state duration modeling

513 Condition Monitoring The optimal parameters 120582lowast

obtained in the previous phase have been used to performthe online conditionmonitoring experiment on the 10 testingcases for all the 6 considered HSMM configurations In thisexperiment we simulate online monitoring by consideringall the testing observations up to the current time thatis x

1x2sdot sdot sdot x

119905 Each time a new data point is acquired

the Viterbi algorithm is used to estimate the current state119904lowast

119905= argmax

1le119894le119873[120575

119905(119894)] as specified in (34)

12 Mathematical Problems in Engineering

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(a) AIC values for continuous data and Gaussian durationdistribution

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(b) AIC values for continuous data and Gamma duration distri-bution

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(c) AIC values for continuous data and Weibull duration distri-bution

2 3 4 5 6 7 8Number of states

0

02

04

06

08

1

12

14

(d) AIC values for discrete data and Gaussian duration distribu-tion

2 3 4 5 6 7 8Number of states

0

02

04

06

08

1

12

14

GaussianGamma

Weibull

(e) AIC values for discrete data andGammaduration distribution

2 3 4 5 6 7 8Number of states

GaussianGamma

Weibull

0

02

04

06

08

1

12

14

(f) AIC values for discrete data andWeibull duration distribution

Figure 3 Akaike Information Criterion (AIC) applied to continuous and discrete observations data AIC is effective for automatic modelselection since its minimum value provides the same number of states and duration model used to generate the data

An example of execution of the condition monitoringexperiment is shown in Figure 4 for both continuous anddiscrete observations respectively In Figure 4(a) the stateduration has beenmodeledwith aGammadistribution whilein Figure 4(b) with a Gaussian distribution In Figures 4(a)and 4(b) the first display represents the true state of the

HSMM and the second display represents the estimated statefrom the Viterbi algorithm while the third display representsthe observed time series

Knowing the true state sequence we calculated theaccuracy defined as the percentage of correctly estimatedstates over the total length of the state sequence for each of

Mathematical Problems in Engineering 13

0 100 200 300 400 500 600 70012345

True state sequence

0 100 200 300 400 500 600 70012345

Viterbi path accuracy 0985

0 100 200 300 400 500 600 7000

204060 Observations

Correct guessWrong guess

(a) State estimation with Viterbi path for continuous data andGamma duration distribution

0 100 200 300 400 500 600 70012345

True state sequence

0 100 200 300 400 500 600 70012345

Viterbi path accuracy 0992

Correct guessWrong guess

0 100 200 300 400 500 600 700246

Observations

(b) State estimation with Viterbi path for discrete data andGaussian duration distribution

Figure 4 Condition monitoring using the Viterbi path HSMMs can be effective to solve condition monitoring problems in time-dependentapplications due to their high accuracy in hidden state recognition

the testing casesThe results are summarized in Table 1(a) forthe continuous observations and in Table 1(b) for the discreteobservations The high percentage of correct classified statesshows that HSMMs can be effectively used to solve conditionmonitoring problems for applications in which the systemshows a strong time and state duration dependency

514 Remaining Useful Lifetime Estimation In this experi-mental phasewe considered the state 119878

5as the failure state and

the trained parameters 120582lowast of Section 512 for each HSMMconfiguration As the online RUL estimation procedure isintended to be used in real time we simulated conditionmonitoring experiment where we progressively consider theobservations x

1x2sdot sdot sdot x

119905 up to time 119905 When a new obser-

vation is acquired after the current state probability 120575119905(119894)

is estimated (Equation (48)) the calculation of the averageupper and lower RUL ((57) (58) and (59)) is performed

Examples of RUL estimation are illustrated in Figure 5In particular Figure 5(a) represents the case of continuousobservations and durationmodeled by aWeibull distributionwhile Figure 5(b) shows the case of discrete observationsand duration modeled by a Gamma distribution From thefigures one can notice that the average as well as the lowerand the upper bound estimations converges to the real RULas the failure time approaches Moreover as expected theuncertainty about the estimation decreases with the timesince predictions performed at an early stage are moreimprecise As a consequence the upper and the lower boundbecome more narrow as the failure state approaches and theestimation becomes more precise until it converges to theactual RUL value with the prediction error tending to zeroat the end of the evaluation

To quantitatively estimate the performance of our meth-odology for the RUL estimation we considered at each time119905 the absolute prediction error (APE) between the real RULand the predicted value defined as

APE (119905) = 1003816100381610038161003816RULreal (119905) minus RUL (119905)1003816100381610038161003816 (63)

where RULreal(119905) is the (known) value of the RUL at time119905 while RUL(119905) is RUL predicted by the model To evaluatethe overall performance of our methodology we consideredthe average absolute prediction error of the RUL estimationdefined as

APE =sum

119879

119905=1APE (119905)119879

(64)

where 119879 is the length of the testing signal APE being aprediction error average values of (64) close to zero corre-spond to good predictive performances

The results for each of the 10 testing cases and the differentHSMM configurations are reported in Tables 2(a) and 2(b)for the continuous and the discrete observation cases respec-tively As it can be noticed the online Remaining UsefulLifetime estimation and in general the online prediction ofthe time to a certain event can be effectively faced withHSMMs which achieve a reliable estimation power with asmall prediction error

Finally we tested our RUL estimationmethodology usingthe state duration estimation of (16) introduced byAzimi et al[30ndash32] The results are shown in Tables 3(a) and 3(b)in which respectively the prediction errors obtained forcontinuous and discrete observations are reported

Comparing Table 2 and Table 3 one can notice that theproposed RUL method outperforms the one of Azimi This

14 Mathematical Problems in Engineering

0 50 100 150 200 250 300 3500

50

100

150

200

250

300

350

400

Time

True RULUpper RUL

Average RULLower RUL

(a) Remaining Useful Lifetime estimation for continuous data andWeibull duration distribution

0 50 100 150 200 250 300 3500

50

100

150

200

250

300

350

400

Time

True RULUpper RUL

Average RULLower RUL

(b) Remaining Useful Lifetime estimation for discrete data andGamma duration distribution

Figure 5 HSMMs effectively solve RUL estimation problemsThe prediction converges to the actual RUL value and its uncertainty decreasesas the real failure time approaches

Table 1 State recognition accuracy

(a) Continuous observations

Test case Duration distributionGaussian Gamma Weibull

1 994 985 9922 997 986 9953 994 992 9974 989 989 9975 982 989 1006 991 988 9977 985 994 9978 992 991 9959 992 986 99710 992 991 995Average 991 989 996

(b) Discrete observations

Test case Duration distributionGaussian Gamma Weibull

1 974 967 9742 972 976 9653 994 958 9664 982 953 9775 991 974 9756 978 977 9787 958 972 9668 977 964 9729 989 972 98510 992 956 969Average 981 967 973

is mainly due to the proposed average state duration of (20)compared to the one of Azimi given by (16)

52 Real Data In this sectionwe apply the proposedHSMM-based approach for RUL estimation to a real case studyusing bearing monitoring data recorded during experimentscarried out on the Pronostia experimental platform andmadeavailable for the IEEE Prognostics and Health Management(PHM) 2012 challenge [49]The data correspond to normallydegraded bearings leading to cases which closely correspondto the industrial reality

The choice of testing the proposed methodology onbearings derives from two facts (i) bearings are the mostcritical components related to failures of rotating machines[50] and (ii) their monotonically increasing degradationpattern justifies the usage of left-right HSMMmodels

521 Data Description Pronostia is an experimental plat-form designed and realized at the Automatic Controland Micro-Mechatronic Systems (AS2M) Department ofFranche-Comte Electronics Mechanics Thermal Process-ing Optics-Science and Technology (FEMTO-ST) Institute(httpwwwfemto-stfr) (Besancon France) with the aimof collecting real data related to accelerated degradation ofbearings Such data are used to validate methods for bearingcondition assessment diagnostic and prognostic [19 51ndash59]

The Pronostia platform allows to perform run-to-failuretests under constant or variable operating conditions Theoperating conditions are determined by two factors thatcan be controlled online (i) the rotation speed and (ii)the radial force load During each experiment tempera-ture and vibration monitoring measurements are gatheredonline through two type of sensors placed in the bearing

Mathematical Problems in Engineering 15

Table 2 Average absolute prediction error (APE) of the RUL esti-mation using the proposed state duration estimator of (20)

(a) APE of the RUL estimation for the continuous observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 51 170 67 140 290 091 45 170 812 76 190 50 61 210 85 66 190 613 77 54 190 29 120 170 160 290 304 90 210 29 75 220 68 60 190 675 73 190 47 22 140 140 39 170 876 65 180 56 51 180 100 140 270 277 47 160 75 48 170 110 12 130 1208 100 220 29 52 180 100 92 220 399 31 92 140 20 160 130 82 210 4910 64 180 56 75 220 69 33 120 130Average 68 170 74 57 190 99 73 200 70

(b) APE of the RUL estimation for the discrete observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 21 110 140 31 88 140 24 120 1302 21 110 130 110 220 33 190 320 713 51 170 76 66 180 51 23 140 1104 59 65 180 52 170 67 42 160 905 32 140 100 83 190 34 120 240 296 120 240 27 62 180 52 41 84 1607 29 150 97 93 210 23 190 310 668 150 270 70 74 180 43 43 170 949 59 180 77 110 230 55 39 160 8810 35 110 140 55 60 160 52 170 71Average 57 150 100 74 170 66 77 190 90

housing a temperature probe and two accelerometers (oneon the vertical and one on the horizontal axis)

The platform is composed of three main parts a rotatingpart a load profile generation part and a measurement partas illustrated in Figure 6

The rotating part is composed of an asynchronous motorwhich develops a power equal to 250W two shafts and agearbox which allows the motor to reach its rated speed of2830 rpm The motorrsquos rotation speed and the direction areset through a human machine interface

The load profiles part issues a radial force on the externalring of the test bearing through a pneumatic jack connectedto a lever arm which indirectly transmits the load througha clamping ring The goal of the applied radial force is toaccelerate the bearingrsquos degradation process

Table 3 Average absolute prediction error (APE) of the RUL esti-mation using the state duration estimator of (16) introduced byAzimi et al [30ndash32]

(a) APE of the RUL estimation for the continuous observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 578 510 668 262 97 527 259 284 6462 502 444 577 213 170 469 290 192 7083 503 447 573 271 87 565 345 139 7344 518 460 604 213 143 459 349 171 7875 594 537 662 290 95 554 334 156 7496 580 517 671 258 83 541 231 258 6657 594 536 669 182 125 477 360 171 7448 634 556 723 194 157 441 348 178 7709 491 435 570 145 171 432 251 267 67010 544 484 628 232 79 527 241 245 674Average 554 493 635 226 121 499 301 206 715

(b) APE of the RUL estimation for the discrete observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 514 410 624 424 318 530 326 264 7362 496 399 604 595 483 708 313 276 6933 502 386 623 465 357 574 324 257 7024 422 315 538 501 405 606 237 361 6035 443 339 558 478 374 591 360 256 7656 522 432 627 552 443 669 272 316 6437 550 439 668 560 457 670 347 232 7448 503 390 620 604 505 710 351 264 7249 555 474 640 480 372 595 318 222 73610 490 382 607 521 412 631 294 289 687Average 500 397 611 518 413 629 314 274 704

Themeasurement part consists of a data acquisition cardconnected to the monitoring sensors which provides theuser with the measured temperature and vibration data Thevibration measurements are provided in snapshots of 01 scollected each 10 seconds at a sampling frequency of 256 kHz(2560 samples per each snapshot) while the temperature hasbeen continuously recorded at a sampling frequency of 10Hz(600 samples collected each minute)

Further details on the Pronostia test rig can be found onthe data presentation paper [49] and on the web page of thedata challenge (httpwwwfemto-stfrenResearch-depart-mentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-challengephp)

16 Mathematical Problems in Engineering

Table 4 Lifetime duration (in seconds) and operating conditions of the bearings tested in the IEEE PHM 2012 Prognostic Challenge dataset[49]

Condition 1 Condition 2 Condition 31800 rpm and 4000N 1650 rpm and 4200N 1500 rpm and 5000N

Bearing Lifetime [s] Bearing Lifetime [s] Bearing Lifetime [s]Bearing1 1 28030 Bearing2 1 9110 Bearing3 1 5150Bearing1 2 8710 Bearing2 2 7970 Bearing3 2 16370Bearing1 3 23750 Bearing2 3 19550 Bearing3 3 4340Bearing1 4 14280 Bearing2 4 7510Bearing1 5 24630 Bearing2 5 23110Bearing1 6 24480 Bearing2 6 7010Bearing1 7 22590 Bearing2 7 2300

Loadmodule

Testedbearing

Dataacquisitionmodule

Rotatingmodule

Figure 6 Global overview of the Pronostia experimental platform[19]

Regarding the data provided for the PHM 2012 challenge3 different operating conditions were considered

(i) first operating conditions speed of 1800 rpm and loadof 4000 Newton

(ii) second operating conditions speed of 1650 rpm andload of 4200 Newton

(iii) third operating conditions speed of 1500 rpm andload of 5000 Newton

Under the above operating conditions a total of 17 accel-erated life tests were realized on bearings of type NSK 6804DD which can operate at a maximum speed of 13000 rpmand a load limit of 4000N The tests were stopped whenthe amplitude of the vibration signal was higher than 20 gthus this moment was defined as the bearing failure time Anexample of bearing before and after the experiment is shownin Figure 7 together with the corresponding vibration signalcollected during the whole test

Table 4 reports how the 17 tested bearings were separatedinto the three operating conditions Moreover the durationof each experiment being the RUL to be predicted for eachbearing is also given We performed two sets of experimentsby considering respectively the bearings relative to thefirst and the second operating condition (ie Bearing1 1Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

As already mentioned the available data correspondto normally degraded bearings meaning that the defects

were not initially induced and that each degraded bearingcontains almost all the types of defects (balls rings and cage)resembling faithfully a common real industrial situationMoreover no assumption about the type of failure to beoccurred is provided with the data and since the variabilityin experiment durations is high (from 1 h to 7 h) performinggood estimates of the RUL is a difficult task [49]

In our experiments we considered as input to our modelthe horizontal channel of the accelerometerWe preprocessedthe raw signals by extracting two time-domain features thatis root mean square (RMS) and kurtosis within windowsof the same length as the given snapshots (119871 = 2560) Let119903119908(119905) be the raw signal of the 119908th window for each 119908 we

estimate RMS as 119909RMS119908

= radic(1119871)sum119871

119905=11199032119908(119905) and kurtosis as

119909KURT119908

= ((1119871)sum119871

119905=1(119903

119908(119905) minus 119903

119908)4)((1119871)sum

119871

119905=1(119903

119908(119905) minus 119903

119908)2)2

where 119903119908is the mean of 119903

119908 An example of feature extraction

for Bearing1 1 is shown in Figure 8To assess the performance of the proposed HSMM after

the model selection procedure we implemented a leave-one-out cross validation scheme by considering separatelyconditions 1 and 2 we performed the online RUL estimationfor each of the 7 bearings using an HSMM trained with theremaining 6 bearing histories Similarly to the simulated casewe considered the average absolute prediction error definedin (64) to quantitatively evaluate our method

522 Bearings RUL Estimation We performed our exper-iments in two steps firstly we applied model selection inorder to determine an optimalmodel structure and secondlywe estimated the RUL of the bearings The full procedure isdetailed in the following

(A) HSMM Structure To determine an appropriate HSMMstructure for effectively modeling the considered data weconsidered several HSMM structures characterized by (i)the duration distribution family (being Gaussian Gamma orWeibull) (ii) an increasing number of states119873 from 2 to 6and (iii) an increasing number of Gaussian mixtures 119872 inthe observation density from 1 to 4 For eachmodel structureobtained by systematically considering all the combinationsof (i) to (iii) we run 120 parameter learnings correspondingto 120 random initializations 1205820 on the data sets (Bearing1 1

Mathematical Problems in Engineering 17

times106

0 1 2 3 4 5 6 7

50

40

30

20

10

0

minus10

minus20

minus30

minus40

minus50

Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]

(b)(a)

0 280300

7

Extracted feature RMS

Time (s)

0 280300

60

Time (s)

Extracted feature kurtosis

Featureextraction

Win

dow

1

Win

dow

2

Win

dow

3

Win

dow

Win

dow

r(t)

x1 x2 x3

middot middot middot

middot middot middot

nminus1

n

xnminus1 xn

Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1

Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density

(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration

Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast

119894were estimated for the 119894th testing bear-

ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878

4as the failure state The same proce-

dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-

ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as

18 Mathematical Problems in Engineering

2 3 4 5 60

005

01

015

02

025

03

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(a) AIC values for Condition 1

2 3 4 5 60

01

02

03

04

05

06

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(b) AIC values for Condition 2

Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density

0 22590

22590

45000

Time (s)

True RULUpper RUL

Average RULLower RUL

(a) RUL estimation for Bearing1 7

True RULUpper RUL

Average RULLower RUL

0 70100

7010

19000

Time (s)

(b) RUL estimation for Bearing2 6

Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings

well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time

Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors

of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2

6 Conclusion and Future Work

In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation

The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required

Mathematical Problems in Engineering 19

Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds

(a) Condition 1

Test Bearings APEavg APElow APEup

Bearing1 1 105716 127230 94146Bearing1 2 43312 38156 38213Bearing1 3 29970 97309 60912Bearing1 4 63363 28766 148719Bearing1 5 19689 74484 104115Bearing1 6 42530 98964 97937Bearing1 7 13880 74943 100881Average 45494 77122 92132

(b) Condition 2

Test Bearings APEavg APElow APEup

Bearing2 1 24759 50065 72875Bearing2 2 16473 44972 82886Bearing2 3 88771 95083 79621Bearing2 4 17698 42486 49825Bearing2 5 86631 104900 107300Bearing2 6 8771 35047 66870Bearing2 7 30125 38664 66519Average 39033 58745 75128

parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology

This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications

As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension

Appendix

In this appendix we give the derivation of the state durationvariable introduced in (20) as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(119883

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A1)

The random variable 119889119905(119894) has been defined in Section 21

as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889

119905(119894) is sampled from an

arbitrary distribution

119889119905(119894) sim 119891 (119889) (A2)

We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as

P (119889119905(119894) = 119889) = P (119904

119905minus119889minus1= 119878

119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905 120582)

(A3)

We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889

119905(119894) of 119889

119905(119894) defined as its

expected value (see Equation (15))

119889119905(119894) = E (119889

119905(119894) | 119904

119905= 119878

119894 x

1x2sdot sdot sdot x

119905) 1 le 119894 le 119873 (A4)

From the definition of expectation we have

119889119905(119894) =

119905

sum

119889=1

119889 sdot P (119889119905(119894) = 119889)

=

119905

sum

119889=1

119889 sdot P (119904119905minus119889minus1

= 119878119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

(A5)

For 119889119905+1

(119894) we have

119889119905+1

(119894) =

119905+1

sum

119889=1

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119886)

(A6)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

(A7)

By noticing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

= (P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1))

sdot (P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1))

minus1

(A8)

20 Mathematical Problems in Engineering

we can replace the probability of the second termof (A7)with

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119887)

(A9)

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

(A10)

In the last factor of (A10) we can omit the information aboutthe current state and observation by observing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

asymp P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894| 119904

119905= 119878

119894 x

1 x

119905)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119888)

(A11)

if the following independencies hold

119904119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

119883119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

(A12)

wherewithperpwe denote independency Equation (A12) holdsforHMMs (evenwithout conditioning on x

1 x

119905) but they

do not hold for HSMMs since the state duration (expressedby 119904

119905minus119889+1 119904

119905minus1) determines the system evolution On

the other hand state duration is partially known by theobservtions x

1 x

119905Thus the approximation is reasonable

as long as the uncertainty on the states is within limitsFrom (A6) (A9) and (A11) we obtain

119889119905+1

(119894) = (119886) +

119905+1

sum

119889=2

119889 sdot (119887) sdot (119888)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

P(119860119861|119862)=P(119860|119861119862)sdotP(119861|119862)

+

119905+1

sum

119889=2

119889 sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

it does not depend on 119889

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

+ P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [

[

P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894119904

119905+1= 119878

119894 x

1 x

119905x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

for the approximation of (A11)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

]

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) +

119905

sum

1198891015840=1

(1198891015840+ 1)

sdot P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

(A13)

Noticing that

119905

sum

1198891015840=1

P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

+ P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) = 1

(A14)

because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows

119889119905+1

(119894) = P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1) sdot (119889

119905(119894) + 1)

(A15)

The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step

In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889

119905+1(119894)

we can consider the following equality

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=P (119904

119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

P (119904119905+1

= 119878119894| x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

120574119905+1

(119894)

(A16)

If we consider the terms involved in the probability at thenumerator of the right-hand side of (A16) we have that

x1 x

119905⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119861

perp x119905+1⏟⏟⏟⏟⏟⏟⏟

119862

| 119904119905= 119878

119894 119904

119905+1= 119878

119894⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119860

(A17)

Mathematical Problems in Engineering 21

If 119861 perp 119862 | 119860 for the Bayes rule we have that

P (119860 | 119862 119861) =P (119862 | 119860119861) sdot P (119860 | 119861)

P (119862 | 119861) (A18)

Hence we can rewrite the numerator of the right-hand sideof (A16) as follows

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

= (P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905)

sdot P(x119905+1

|

x119905+1

perp119904119905|119904119905+1

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞119904119905= 119878

119894 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

= (P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

sdot

120574119905(119894)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (119904

119905= 119878

119894| x

1 x

119905) sdot

119887119894(x119905+1

)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (x

119905+1| 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

(A19)

The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as

P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

= sum

119889119905

119886119894119894(d

119905) sdot P (119889

119905| x

1 x

119905)

asymp 119886119894119894(d

119905)

(A20)

while the denominator of (A19) can be expressed as follows

P (x119905+1

| x1 x

119905) =

P (x1 x

119905 x

119905+1)

P (x1 x

119905)

=sum

119873

119894=1120572119905+1

(119894)

sum119873

119894=1120572119905(119894)

(A21)

By substituting (A20) and (A21) in (A19) we obtain

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

sum119873

119894=1120572119905+1

(119894)

(A22)

and then by combining (A22) and (A16) we obtain

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

120574119905+1

(119894) sum119873

119894=1120572119905+1

(119894)

(A23)

Finally by substituting (A23) in (A15) and considering that

120574119905(119894) =

120572119905(119894)

sum119873

119894=1120572119905(119894)

(A24)

we derive the induction formula for 119889119905+1

(119894) in terms of modelparameters as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(x

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A25)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989

[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004

[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996

[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002

[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005

[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008

[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309

[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en

[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003

[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002

[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven

22 Mathematical Problems in Engineering

methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012

[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008

[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989

[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031

[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010

[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010

[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011

[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012

[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012

[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980

[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk

[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007

[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005

[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003

[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997

[26] S-Z Yu ldquoHidden semi-Markov modelsrdquo Artificial Intelligencevol 174 no 2 pp 215ndash243 2010

[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003

[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006

[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014

[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003

[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008

[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005

[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999

[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007

[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011

[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001

[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002

[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007

[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997

[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008

[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005

[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997

Mathematical Problems in Engineering 23

[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002

[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986

[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993

[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006

[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973

[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977

[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012

[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985

[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015

[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013

[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013

[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013

[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013

[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012

[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012

[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model

and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013

[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Hidden Semi-Markov Models for Predictive Maintenance

Mathematical Problems in Engineering 5

Hiddenstates

Observationprobabilities

Sojournprobabilities

Time (u) Time (u) Time (u)

Observed Observed Observed

S1 S2 S3

a12 a23

P(oS1) P(oS2) P(oS3)

d3(u)d2(u)d1(u)

Figure 1 Graphical representation of an HSMM

Contrarily toHMMs forHSMMs the state durationmustbe taken into account in the the forward variable calculationConsequently Yu [26] proposed the following inductiveformula

120572119905(119895 119889) =

119863

sum

1198891015840=1

119873

sum

119894=1

(120572119905minus1198891015840 (119894 119889

1015840) 119886

0

119894119895119901119895119895(119889

1015840)

119905

prod

119896=119905minus119889+1

119887119895(x

119896))

1 le 119895 le 119873 1 le 119905 le 119879

(13)

that is the sum of the probabilities of being in the currentstate 119878

119895(at time 119905) for the past 1198891015840 time units (with 1 le 119889

1015840le 119863

and119863 themaximumallowed duration for each state) comingfrom all the possible previous states 119878

119894 1 le 119894 le 119873 and 119894 = 119895

The disadvantage of the above formulation is that asdiscussed in Introduction the specification of the maximumduration 119863 represents a limitation to the modeling general-ization Moreover from (13) it is clear that the computationand memory complexities drastically increase with119863 whichcan be very large inmany applications in particular for onlinefailure prediction

To alleviate this problem Azimi et al [30ndash32] introduceda new forward algorithm for HSMMs that by keeping trackof the estimated average state duration at each iterationhas a computational complexity comparable to the forwardalgorithm for HMMs [13] However the average state dura-tion represents an approximation Consequently the forwardalgorithm of Azimi compared with (13) pays the priceof a lower precision in favor of a (indispensable) bettercomputational efficiency

To calculate the forward variable 120572119905(119895) using Azimirsquos

approach the duration-dependent transition matrix defined

in (8) is taken in consideration in the induction formula of(13) which becomes [30]

120572119905(119895) = [

119873

sum

119894=1

120572119905minus1

(119894) 119886119894119895(d

119905minus1)] 119887

119895(x

119905) (14)

To calculate the above formula the average state duration of(2)must be estimated for each time 119905 bymeans of the variabled119905= [119889

119905(119894)] defined as

119889119905(119894) = E (119889

119905(119894) | x

1x2sdot sdot sdot x

119905 119904

119905= 119878

119894 120582) 1 le 119894 le 119873 (15)

where E denotes the expected value To calculate the abovequantity Azimi et al [30ndash32] use the following formula

d119905= 120574

119905minus1⊙ d

119905minus1+ 1 (16)

where ⊙ represents the element by element product betweentwo matricesvectors and the vector 120574t = [120574

119905(119894)] (the

probability of being in state 119878119894at time 119905 given the observation

sequence and the model parameters) with dimensions119873 times 1

is calculated in terms of 120572119905(119894) as

120574119905(119894) = P (119904

119905= 119878

119894| x

1x2sdot sdot sdot x

119905 120582) =

120572119905(119894)

sum119873

119895=1120572119905(119895)

1 le 119894 le 119873

(17)

Equation (16) is based on the following induction formula[30ndash32] that rules the dynamics of the duration vector whenthe systemrsquos state is known

119889119905(119894) = 119904

119905minus1(119894) sdot 119889

119905minus1(119894) + 1 (18)

where for each 119905 119904119905(119894) is 1 if 119904

119905= 119878

119894 0 otherwise

6 Mathematical Problems in Engineering

A simple example shows that (18) is incorrect assum-ing an HSMM with three states and considering the statesequence (119878

1 119878

1 119878

2 ) the correct sequence of the duration

vector is d1= [1 1 1]

119879 d2= [2 1 1]

119879 and d3= [1 1 1]

119879where the superscript 119879 denotes vector transpose If we apply(18) we obtain d

1= [1 1 1]

119879 d2= [2 1 1]

119879 and d3=

[1 2 1]119879 which is in contradiction with the definition of the

state duration vector given in (2)To calculate the average state duration variable 119889

119905(119894) we

propose a new induction formula that estimates for each time119905 the time spent in the 119894th state prior to 119905 as

119889119905(119894) = P (119904

119905minus1= 119878

119894| 119904

119905= 119878

119894 x

1 x

119905) sdot (119889

119905minus1(119894) + 1) (19)

=119886119894119894(d

119905minus1) sdot 120572

119905minus1(119894) sdot 119887

119894(x

119905)

120572119905(119894)

sdot (119889119905minus1

(119894) + 1)

1 le 119894 le 119873

(20)

The derivation of (20) is given in Appendix The intuitionbehind (19) is that the current average duration is the previousaverage duration plus one weighted with the ldquoamountrdquo of thecurrent state that was already in state 119878

119894in the previous step

Using the proposed (20) the forward algorithm can bespecified as follows

(1) initialization with 1 le 119894 le 119873

1205721(119894) = 120587

119894119887119894(x

1)

1198891(119894) = 1

Ad1

= P (d1) + (I minus P (d

1))A0

(21)

where P(d119894) is estimated using (6)

(2) induction with 1 le 119895 le 119873 and 1 le 119905 le 119879 minus 1

120572119905+1

(119895) = [

119873

sum

119894=1

120572119905(119894) 119886

119894119895(d

119905)] 119887

119895(x

119905+1) (22)

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(x

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (23)

Ad119905+1

= P (d119905+1

) + (I minus P (d119905+1

))A0 (24)

where 119886119894119895(d

119905) are the coefficients of the matrix Ad

119905

(3) termination

P (x | 120582) =119873

sum

119894=1

120572119879(119894) (25)

Similar considerations as the forward procedure can bemade for the backward algorithm which is implemented bydefining the variable 120573

119905(119894) as

120573119905(119894) = P (x

119905+1x119905+2

sdot sdot sdot x119879| 119904

119905= 119878

119894 120582) 1 le 119894 le 119873 (26)

Having estimated the dynamic transition matrix Ad119905

foreach 1 le 119905 le 119879 using (24) the backward variable can becalculated inductively as follows

(1) Initialization

120573119879(119894) = 1 1 le 119894 le 119873 (27)

(2) Induction

120573119905(119894) =

119873

sum

119895=1

119886119894119895(d

119905) 119887

119895(x

119905+1) 120573

119905+1(119895)

119905 = 119879 minus 1 119879 minus 2 1 1 le 119894 le 119873

(28)

Although the variable 120573119905(119894) is not necessary for the

calculation of the model likelihood it will be useful in theparameter reestimation procedure as it will be explained inSection 223

222 The Viterbi Algorithm The Viterbi algorithm [46 47](also known as decoding) allows determining the best statesequence corresponding to a given observation sequence

Formally given a sequence of observation x = x1x2sdot sdot sdot x

119879

the best state sequence 119878lowast = 119904lowast

1119904lowast

2sdot sdot sdot 119904

lowast

119879corresponding to x is

calculated by defining the variable 120575119905(119894) as

120575119905(119894) = max

11990411199042119904119905minus1

P (11990411199042 119904

119905= 119878

119894 x

1x2sdot sdot sdot x

119905| 120582) (29)

The procedure to recursively calculate the variable 120575119905(119894)

and to retrieve the target state sequence (ie the argumentswhich maximize the 120575

119905(119894)rsquos) for the proposed HSMM is a

straightforward extension of theViterbi algorithm forHMMs[13]The only change is the usage in the recursive calculationof 120575

119905(119894) of the dynamic transition matrix Ad

119905

= [119886119894119895(d

119905)]

calculated through (24) The Viterbi algorithm for the intro-duced parametric HSMMs can be summarized as follows

(1) initialization with 1 le 119894 le 119873

1205751(119894) = 120587

119894119887119894(x

1)

1205951(119894) = 0

(30)

(2) recursion with 1 le 119895 le 119873 and 2 le 119905 le 119879

120575119905(119895) = max

1le119894le119873

[120575119905minus1

(119894) 119886119894119895(d

119905)] 119887

119895(x

119905) (31)

120595119905(119895) = argmax

1le119894le119873

[120575119905minus1

(119894) 119886119894119895(d

119905)] (32)

(3) termination

119875lowast= max

1le119894le119873

[120575119879(119894)] (33)

119904lowast

119879= argmax

1le119894le119873

[120575119879(119894)] (34)

where we keep track of the argument maximizing (31) usingthe vector 120595

119905 which tracked back gives the desired best

state sequence

119904lowast

119905= 120595

119905+1(119904

lowast

119905+1) 119905 = 119879 minus 1 119879 minus 2 1 (35)

Mathematical Problems in Engineering 7

223 The Training Algorithm The training algorithm con-sists of estimating themodel parameters from the observationdata As discussed in Section 21 a parametric HSMMis defined by 120582 = (A0

Θ 119862 120583 119880 120587) if the observationsare continuous or 120582 = (A0

Θ 119861 120587) if the observationsare discrete Given a generic observation sequence x =

x1x2sdot sdot sdot x

119879 referred to as training set in the following the

training procedure consists of finding the model parameterset120582lowast which locallymaximizes themodel likelihoodP(x | 120582)

We use the modified Baum-Welch algorithm of Azimiet al [30ndash32] However in our implementation we do notmake assumption on the density function used to model thestate duration and we consider both continuous and discreteobservations

Being a variant of the more general Expectation-Max-imization (EM) algorithm Baum-Welch is an iterative proce-dure which consists of two steps (i) the expectation step inwhich the forward and backward variables are calculated andthe model likelihood is estimated and (ii) the maximizationstep in which the model parameters are updated and usedin the next iteration This process usually starts from arandom guess of the model parameters 1205820 and it is iterateduntil the likelihood function does not improve between twoconsecutive iterations

Similarly to HMMs the reestimation formulas arederived by firstly introducing the variable 120585

119905(119894 119895) which

represents the probability of being in state 119878119894at time 119905 and

in state 119878119895at time 119905 + 1 given the model and the observation

sequence as

120585119905(119894 119895) = P (119904

119905= 119878

119894 119904

119905+1= 119878

119895| x 120582) (36)

However in the HSMM case the variable 120585119905(119894 119895) considers

the duration estimation performed in the forward algorithm(see Equation (24)) Formulated in terms of the forward andbackward variables it is given by

120585119905(119894 119895) = P (119904

119905= 119878

119894 119904

119905+1= 119878

119895| x 120582)

=P (119904

119905= 119878

119894 119904

119905+1= 119878

119895 x | 120582)

P (x | 120582)

=120572119905(119894) 119886

119894119895(d

119905) 119887

119895(x

119905+1) 120573

119905+1(119895)

P (x | 120582)

=120572119905(119894) 119886

119894119895(d

119905) 119887

119895(x

119905+1) 120573

119905+1(119895)

sum119873

119894=1sum

119873

119895=1120572119905(119894) 119886

119894119895(d

119905) 119887

119895(x

119905+1) 120573

119905+1(119895)

(37)

From 120585119905(119894 119895) we can derive the quantity 120574

119905(119894) (already

defined in (17)) representing the probability of being in state119878119894at time 119905 given the observation sequence and the model

parameters

120574119905(119894) =

119873

sum

119895=1

120585119905(119894 119895) (38)

Finally the the reestimation formulas for the parameters120587 and A0 are given by

120587119894= 120574

1(119894) (39)

1198860

119894119895=

(sum119879minus1

119905=1120585119905(119894 119895)) ⊙ 119866

sum119873

119895=1(sum

119879minus1

119905=1120585119905(119894 119895)) ⊙ 119866

(40)

where119866 = [119892119894119895] is a squarematrix of dimensions119873times119873where

119892119894119895= 0 for 119894 = 119895 and 119892

119894119895= 1 for 119894 = 119895 ⊙ represents the element

by element product between two matrices sum119879minus1

119905=1120574119905(119894) is the

expected number of transitions from state 119878119894 andsum119879minus1

119905=1120585119905(119894 119895)

is the expected number of transitions from state 119878119894to state 119878

119895

Equation (39) represents the expected number of timesthat the model starts in state 119878

119894 while (40) represents the

expected number of transitions from state 119878119894to state 119878

119895with

119894 = 119895 over the total expected number of transitions from state119878119894to any other state different from 119878

119894

For the matrix A0 being normalized the stochasticconstraints are satisfied at each iteration that is sum119873

119895=11198860

119894119895= 1

for each 1 le 119894 le 119873 while the estimation of the priorprobability 120587

119894inherently sums up to 1 at each iteration since

it represents the expected frequency in state 119878119894at time 119905 = 1

for each 1 le 119894 le 119873With respect to the reestimation of the state duration

parameters Θ firstly we estimate the mean 120583119894119889

and thevariance 1205902

119894119889of the 119894th state duration for each 1 le 119894 le 119873

from the forward and backward variables and the estimationof the state duration variable

120583119894119889

=sum

119879minus1

119905=1120572119905(119894) (sum

119873

119895=1119895 =119894119886119894119895(119889

119905(119894)) 119887

119895(x

119905+1) 120573

119905+1(119895)) 119889

119905(119894)

sum119879minus1

119905=1120572119905(119894) (sum

119873

119895=1119895 =119894119886119894119895(119889

119905(119894)) 119887

119895(x

119905+1) 120573

119905+1(119895))

(41)

1205902

119894119889= (

119879minus1

sum

119905=1

120572119905(119894)(

119873

sum

119895=1119895 =119894

119886119894119895(119889

119905(119894)) 119887

119895(x

119905+1) 120573

119905+1(119895))

sdot (119889119905(119894) minus 120583

119894119889)2

)

sdot (

119879minus1

sum

119905=1

120572119905(119894)(

119873

sum

119895=1119895 =119894

119886119894119895(119889

119905(119894)) 119887

119895(x

119905+1) 120573

119905+1(119895)))

minus1

(42)

where (41) can be interpreted as the probability of transitionfrom state 119878

119894to 119878

119895with 119894 = 119895 at time 119905weighted by the duration

of state 119878119894at 119905 giving the desired expected value while in (42)

the same quantity is weighted by the squared distance of theduration at time 119905 from its mean giving the estimation of thevariance

Then the parameters of the desired duration distributioncan be estimated from 120583

119894119889and 1205902

119894119889 For example if a Gamma

distribution with shape parameter ] and scale parameter 120578 ischosen to model the state duration the parameters ]

119894and 120578

119894

for each 1 le 119894 le 119873 can be calculated as ]119894= 120583

2

119894119889120590

2

119894119889and

120578119894= 120590

2

119894119889120583

119894119889

8 Mathematical Problems in Engineering

Concerning the observation parameters once the modi-fied forward and backward variables accounting for the stateduration are defined as in (22) and (28) the reestimationformulas are the same as for Hidden Markov Models [13]

In particular for continuous observations the parametersof the Gaussiansrsquo mixture defined in (9) are reestimated byfirstly defining the probability of being in state 119878

119895at time 119905

with the probability of the observation vector x119905evaluated by

the 119896th mixture component as

120574119905(119895 119896) = [

[

120572119905(119895) 120573

119905(119895)

sum119873

119895=1120572119905(119895) 120573

119905(119895)

]

]

sdot [

[

119888119895119896N (x

119905120583

119895119896U

119895119896)

sum119872

119898=1119888119895119898N (x

119905120583

119895119898U

119895119898)

]

]

(43)

By using the former quantity the parameters 119888119895119896 120583

119895119896 andU

119895119896

are reestimated through the following formulas

119888119895119896=

sum119879

119905=1120574119905(119895 119896)

sum119879

119905=1sum

119872

119898=1120574119905(119895 119898)

120583119895119896=sum

119879

119905=1120574119905(119895 119896) sdot x

119905

sum119879

119905=1120574119905(119895 119896)

U119895119896=sum

119879

119905=1120574119905(119895 119896) sdot (x

119905minus 120583

119895119896) (x

119905minus 120583

119895119896)119879

sum119879

119905=1120574119905(119895 119896)

(44)

where superscript 119879 denotes vector transposeFor discrete observations the reestimation formula for

the observation matrix 119887119895(119897) is

119887119895(119897) =

sum119879

119905=1with119909119905=119883119897

120574119905(119895)

sum119879

119905=1120574119905(119895)

(45)

where the quantity 120574119905(119895) which takes into account the dura-

tion dependent forward variable 120572119905(119895) is calculated through

(17)The reader is referred to Rabinerrsquos work [13] for the inter-

pretation on the observation parameters reestimation formu-las

3 AIC-Based Model Selection

In the framework of the proposed parametric HSMMs themodel selection procedure aims to select the optimal numberof hidden states119873 the right duration distribution family andin the case of mixture observation modeling the number ofGaussian mixtures 119872 to be used In this work we make useof the Akaike Information Criterion (AIC) Indeed it hasbeen seen that in case of complex models and in presence ofa limited number of training observations AIC represents asatisfactorymethodology formodel selection outperformingother approaches like Bayesian Information Criterion

In general information criteria are represented as a two-term structure They account for a compromise between

a measure of model fitness which is based on the likelihoodof the model and a penalty term which takes into accountthe model complexity Usually the model complexity ismeasured in terms of the number of parameters that have tobe estimated and in terms of the number of observations

The Akaike Information Criterion is an estimate of theasymptotic value of the expected distance between theunknown true likelihood function of the data and the fittedlikelihood function of the model In particular the AIC canbe expressed as

AIC =minus log 119871 () + 119901

119879 (46)

where 119871() is the likelihood of the model with the estimatedparameters as defined in (25) 119901 is the number of modelparameters and119879 is the length of the observed sequenceThebest model is the one minimizing equation (46)

Concerning 119901 the number of parameters to be estimatedfor a parametric HSMM with119873 states is 119901 = 119901

ℎ+ 119901

119900 where

119901ℎare the parameters of the hidden states layer while 119901

119900are

those of the observation layerIn particular 119901

ℎ= (119873 minus 1) + (119873 minus 1) sdot 119873 + 119911 sdot 119873 where

(i) 119873 minus 1 accounts for the prior probabilities 120587(ii) (119873 minus 1) sdot 119873 accounts for the nonrecurrent transition

matrix A0(iii) 119911sdot119873 accounts for the duration probability being 119911 the

number of parameters 120579 of the duration distribution

Concerning 119901119900 a distinction must be made between

discrete and continuous observations

(i) in the case of discrete observations with 119871 possibleobservable values 119901

119900= (119871 minus 1) sdot 119873 which accounts

for the elements of the observation matrix 119861(ii) if the observations are continuous and a multivariate

mixture of 119872 Gaussians with 119874 variates is used asobservationmodel 119901

119900= [119874 sdot119873sdot119872]+[119874 sdot119874 sdot119873sdot119872]+

[(119872minus 1) sdot 119873] where each term accounts respectivelyfor the mean vector 120583 the covariance matrix 119880 andthe mixture coefficients 119862

4 Remaining Useful Lifetime Estimation

One of the most important advantages of the time modelingof HSMMs is the possibility to effectively face the predictionproblem The knowledge of the state duration distributionsallows the estimation of the remaining time in a certain stateand in general the prediction of the expected time119863 beforeentering in a determinate state

As already mentioned an interesting application of theprediction problem is the Remaining Useful Lifetime (RUL)estimation of industrial equipments Indeed if each stateof an HSMM is mapped to a different condition of anindustrial machine and if the state 119878

119896that represents the

failure condition is identified at each moment the RUL canbe defined as the expected time 119863 to reach the failure state

Mathematical Problems in Engineering 9

119878119896 If we assume that the time to failure is a random variable

119863 following a determinate probability density we define theRUL at the current time 119905 as

RUL119905= 119863 = E (119863) 119904

119905+= 119878

119896 119904

119905+minus1= 119878

119894

1 le 119894 119896 le 119873 119894 = 119896

(47)

where E denotes the expected valueHaving fixed the failure state the estimation of the RUL

is performed in two steps every time a new observation isacquired (online)

(1) estimation of the current state(2) projection of the future state transitions until the

failure state is reached and estimation of the expectedsojourn time

The estimation of the current state is performed via theViterbi path that is the variable 120575

119905= [120575

119905(119894)]

1le119894le119873defined in

(29) To correctly model the uncertainty of the current stateestimation we use the normalized variable 120575

119905(119894) obtained as

120575119905(119894) = max

11990411199042119904119905minus1

P (119904119905= 119878

119894| 119904

11199042sdot sdot sdot 119904

119905minus1 x

1x2sdot sdot sdot x

119905 120582)

=120575119905(119894)

sum119873

119895=1120575119905(119895)

1 le 119894 le 119873

(48)

that is an estimate of the probability of being in state 119878119894at

time 119905Together with the normalized variable 120575

119905(119894) the maxi-

mum a posteriori estimate of the current state 119904lowast

119905is taken

into account according to (34) If 119904lowast119905coincides with the failure

state the desired event is detected by the model and the timeto this event is obviously zero Otherwise an estimation ofthe average remaining time in the current state 119889avg(119904

lowast

119905) is

calculated as

119889avg (119904lowast

119905) =

119873

sum

119894=1

(120583119889119894

minus 119889119905(119894)) ⊙ 120575

119905(119894) (49)

where with 120583119889119894

we denote the expected value of the durationvariable in state 119878

119894according to the duration distribution

specified by the parameters 120579119894 Equation (49) thus estimates

the remaining time in the current state by subtracting theestimated states duration 119889

119905(119894) at time 119905 from the expected

sojourn time of state 119878119894 and weighting the result using the

uncertainty about the current state 120575119905(119894) and finally by

summing up all the contributions from each stateIn addition to the average remaining time a lower and an

upper bound value can be calculated based on the standarddeviation 120590

119889119894

of the duration distribution for state 119878119894

119889low (119904lowast

119905) =

119873

sum

119894=1

(120583119889119894

minus 120590119889119894

minus 119889119905(119894)) ⊙ 120575

119905(119894) (50)

119889up (119904lowast

119905) =

119873

sum

119894=1

(120583119889119894

+ 120590119889119894

minus 119889119905(119894)) ⊙ 120575

119905(119894) (51)

Once the remaining time in the current state is estimatedthe probability of the next state is calculated by multiplyingthe transpose of the nonrecurrent transition matrix by thecurrent state probability estimation as follows

120575next = [120575119905+119889

(119894)]1le119894le119873

= (A0)119879

sdot 120575119905 (52)

while themaximumaposteriori estimate of the next state 119904lowastnextis calculated as

119904lowast

next = 119904lowast

119905+119889= argmax

1le119894le119873

120575119905+119889

(119894) (53)

Again if 119904lowast119905+119889

coincides with the failure state the failurewill happen after the remaining time at the current state isover and the average estimation of the failure time is 119863avg =

119889avg(119904lowast

119905) calculated at the previous step with the bound values

119863low = 119889low(119904lowast

119905) and119863up = 119889up(119904

lowast

119905) Otherwise the estimation

of the sojourn time of the next state is calculated as follows

119889avg (119904lowast

119905+119889) =

119873

sum

119894=1

120583119889119894

⊙ 120575119905+119889

(119894) (54)

119889low (119904lowast

119905+119889) =

119873

sum

119894=1

(120583119889119894

minus 120590119889119894

) ⊙ 120575119905+119889

(119894) (55)

119889up (119904lowast

119905+119889) =

119873

sum

119894=1

(120583119889119894

+ 120590119889119894

) ⊙ 120575119905+119889

(119894) (56)

This procedure is repeated until the failure state isencountered in the prediction of the next state The calcula-tion of the RUL is then simply obtained by summing all theestimated remaining time in each intermediate state beforeencountering the failure state

119863avg = sum119889avg (57)

119863low = sum119889low (58)

119863up = sum119889up (59)

Finally Algorithm 1 details the above described RUL esti-mation procedure

5 Experimental Results

To demonstrate the effectiveness of the proposed HSMMmodels we make use of a series of experiments performedboth on simulated and real data

The simulated data were generated by considering a left-right HSMM and adapting the parameters of the artificialexample reported in the work of Lee et al [15] The real casedata are monitoring data related to the entire operationallife of bearings made available for the IEEE PHM 2012 datachallenge (httpwwwfemto-stfrenResearch-departmentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-chal-lengephp)

10 Mathematical Problems in Engineering

(1) function RulEstimation(x119905 119878

119896) ⊳ x

119905 The last observation acquired

(2) ⊳ 119878119896 The failure state

(3) Initialization(4) 119863avg larr 0

(5) 119863low larr 0

(6) 119863up larr 0

(7) Current state estimation(8) Calculate 120575

119905⊳ Using (48)

(9) Calculate 119904lowast119905

⊳ Using (34)(10) Calculate d

119905⊳ Using (20)

(11) 119878 larr 119904lowast

119905

(12) Loop(13) while 119878 = 119878

119896do

(14) Calculate 119889avg ⊳ Using (49) or (54)(15) Calculate 119889low ⊳ Using (50) or (55)(16) Calculate 119889up ⊳ Using (51) or (56)(17) 119863avg larr 119863avg + 119889avg

(18) 119863low larr 119863low + 119889low(19) 119863up larr 119863up + 119889up(20) Calculate 120575next ⊳ Using (52)(21) Calculate 119904lowastnext ⊳ Using (53)(22) 119878 larr 119904

lowast

nextend

(23) return 119863avg119863low119863up

Algorithm 1 Remaining Useful Lifetime estimation (Pseudo-Code)

51 Simulated Experiment Data have been generated withthe idea of simulating the behavior of an industrial machinethat during its functioning experiences several degradationmodalities until a failure state is reached at the end of itslifetimeThe generated data are used to test the performancesof our methodology for (i) automatic model selection (ii)online condition monitoring and (iii) Remaining UsefulLifetime estimation considering both continuous and dis-crete observations

511 Data Generation The industrial machine subject ofthese experiments has been modeled as a left-right paramet-ric HSMM with 119873 = 5 states having state 119878

5as absorbing

(failure) stateThe choice of a left-right setting has beenmadefor simplicity reasons since the primary goal of this work is todemonstrate that the proposed model specification coupledwith the Akaike Information Criterion is effective to solveautomatic model selection online condition monitoringand prediction problems At this purpose we divided theexperiments in two cases according to the nature of theobservation being continuous or discrete

For each of the continuous and the discrete cases threedata sets have been generated by considering the followingduration models Gaussian Gamma and Weibull densitiesFor each of the three data sets 30 series of data are usedas training set and 10 series as testing set Each time seriescontains 119879 = 650 observations The parameters used togenerate the data are taken from the work of Lee et al [15]

and are adapted to obtain an equivalent left-right parametricHSMM as follows

120587 =

[[[[[

[

1

0

0

0

0

]]]]]

]

A0=

[[[[[

[

0 1 0 0 0

0 0 1 0 0

0 0 0 1 0

0 0 0 0 1

0 0 0 0 1

]]]]]

]

ΘN = 1205791= [100 20] 120579

2= [90 15] 120579

3= [100 20]

1205794= [80 25] 120579

5= [200 1]

ΘG = 1205791= [500 02] 120579

2= [540 01667]

1205793= [500 02] 120579

4= [256 03125]

1205795= [800 0005]

ΘW = 1205791= [102 28] 120579

2= [92 29] 120579

3= [102 28]

1205794= [82 20] 120579

5= [200 256]

(60)

whereΘNΘG andΘW are the different distribution param-eters used for the Gaussian Gamma and Weibull dura-tion models respectively More precisely they represent thevalues of the mean 120583

119889and the the variance 120590

2

119889of the

Gaussian distribution the shape ]119889and the scale 120578

119889of the

Gamma distribution and the scale 119886119889and the shape 119887

119889of

Mathematical Problems in Engineering 11

0 100 200 300 400 500 600

12345

Hidden states sequence

0 100 200 300 400 500 6000

100

200

300 State duration

0 100 200 300 400 500 6000

20

40

60Observed signal

(a) Example of simulated data for the continuous case

0 100 200 300 400 500 600

12345

Hidden states sequence

0 100 200 300 400 500 6000

100

200

300 State duration

0 100 200 300 400 500 600

132

54

76

Observed symbols

(b) Example of simulated data for the discrete case

Figure 2 The data generated with the parameter described in Section 511 both for the continuous and the discrete case

theWeibull distribution It must be noticed that as explainedin Section 21 for state 119878

5 being the absorbing state the

duration parameters 1205795have no influence on the data since

once the state 1198785is reached the system will remain there

foreverConcerning the continuous observation modeling a

bivariate Gaussian distribution has been used with the fol-lowing parameters [15]

1205831= [

20

20] 120583

2= [

20

35] 120583

3= [

35

35]

1205835= [

28

28]

1198801= [

20 0

0 20] 119880

2= [

15 0

0 15] 119880

3= [

15 minus2

minus2 15]

1198804= [

5 0

0 5] 119880

5= [

10 3

3 10]

(61)

while for the discrete case 119871 = 7 distinct observation sym-bols have been taken into consideration with the followingobservation probability distribution

119861 =

[[[[[

[

08 02 0 0 0 0 0

01 08 01 0 0 0 0

0 01 08 01 0 0 0

0 0 01 07 01 01 0

0 0 0 02 06 01 01

]]]]]

]

(62)

An example of simulated data both for the continuous and thediscrete cases is shown in Figure 2 where aGaussian durationmodel has been used

512 Training and Model Selection The goal of this experi-mental phase is to test the effectiveness of the AIC in solvingthe automatic model selection For this purpose the trainingsets of the 6 data sets (continuousdiscrte observation withGaussian Gamma and Weibull duration models) have beentaken individually and for each one of them a series of

learning procedure has been run each one with a variableHSMM structure In particular we took into account all thecombinations of the duration distribution families (GaussianGamma and Weibull) an increasing number of states 119873from 2 to 8 and for the continuous observation cases anincreasing number of Gaussian mixtures119872 in the observa-tion distribution from 1 to 4

As accurate parameter initialization is crucial for obtain-ing a good model fitting [14] a series of 40 learning proce-dures corresponding to 40 random initializations of the initialparameters 1205820 have been executed for each of the consideredHSMM structures For each model structure the AIC valueas defined in (46) has been evaluatedThe final trained set ofparameters 120582lowast corresponding to the minimumAIC value hasbeen retained resulting in 7 HSMMs with a number of statesfrom 2 to 8

The obtained results are shown in Figure 3 for boththe continuous and discrete observation data As it can benoticed for all the 6 test cases of Figure 3 the AIC valuesdo not improve much for a number of states higher than 5meaning that adding more states does not add considerableinformation to the HSMM modeling power Hence 5 statescan be considered as an optimal number of states Moreoveras shown in the zoomed sections of Figure 3 for the HSMMswith 5 states the minimum AIC values are obtained for theduration distributions corresponding to the ones used togenerate the data As a consequence AIC can be considered asan effective approach to performmodel selection for HSMMas well as selecting the appropriate parametric distributionfamily for the state duration modeling

513 Condition Monitoring The optimal parameters 120582lowast

obtained in the previous phase have been used to performthe online conditionmonitoring experiment on the 10 testingcases for all the 6 considered HSMM configurations In thisexperiment we simulate online monitoring by consideringall the testing observations up to the current time thatis x

1x2sdot sdot sdot x

119905 Each time a new data point is acquired

the Viterbi algorithm is used to estimate the current state119904lowast

119905= argmax

1le119894le119873[120575

119905(119894)] as specified in (34)

12 Mathematical Problems in Engineering

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(a) AIC values for continuous data and Gaussian durationdistribution

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(b) AIC values for continuous data and Gamma duration distri-bution

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(c) AIC values for continuous data and Weibull duration distri-bution

2 3 4 5 6 7 8Number of states

0

02

04

06

08

1

12

14

(d) AIC values for discrete data and Gaussian duration distribu-tion

2 3 4 5 6 7 8Number of states

0

02

04

06

08

1

12

14

GaussianGamma

Weibull

(e) AIC values for discrete data andGammaduration distribution

2 3 4 5 6 7 8Number of states

GaussianGamma

Weibull

0

02

04

06

08

1

12

14

(f) AIC values for discrete data andWeibull duration distribution

Figure 3 Akaike Information Criterion (AIC) applied to continuous and discrete observations data AIC is effective for automatic modelselection since its minimum value provides the same number of states and duration model used to generate the data

An example of execution of the condition monitoringexperiment is shown in Figure 4 for both continuous anddiscrete observations respectively In Figure 4(a) the stateduration has beenmodeledwith aGammadistribution whilein Figure 4(b) with a Gaussian distribution In Figures 4(a)and 4(b) the first display represents the true state of the

HSMM and the second display represents the estimated statefrom the Viterbi algorithm while the third display representsthe observed time series

Knowing the true state sequence we calculated theaccuracy defined as the percentage of correctly estimatedstates over the total length of the state sequence for each of

Mathematical Problems in Engineering 13

0 100 200 300 400 500 600 70012345

True state sequence

0 100 200 300 400 500 600 70012345

Viterbi path accuracy 0985

0 100 200 300 400 500 600 7000

204060 Observations

Correct guessWrong guess

(a) State estimation with Viterbi path for continuous data andGamma duration distribution

0 100 200 300 400 500 600 70012345

True state sequence

0 100 200 300 400 500 600 70012345

Viterbi path accuracy 0992

Correct guessWrong guess

0 100 200 300 400 500 600 700246

Observations

(b) State estimation with Viterbi path for discrete data andGaussian duration distribution

Figure 4 Condition monitoring using the Viterbi path HSMMs can be effective to solve condition monitoring problems in time-dependentapplications due to their high accuracy in hidden state recognition

the testing casesThe results are summarized in Table 1(a) forthe continuous observations and in Table 1(b) for the discreteobservations The high percentage of correct classified statesshows that HSMMs can be effectively used to solve conditionmonitoring problems for applications in which the systemshows a strong time and state duration dependency

514 Remaining Useful Lifetime Estimation In this experi-mental phasewe considered the state 119878

5as the failure state and

the trained parameters 120582lowast of Section 512 for each HSMMconfiguration As the online RUL estimation procedure isintended to be used in real time we simulated conditionmonitoring experiment where we progressively consider theobservations x

1x2sdot sdot sdot x

119905 up to time 119905 When a new obser-

vation is acquired after the current state probability 120575119905(119894)

is estimated (Equation (48)) the calculation of the averageupper and lower RUL ((57) (58) and (59)) is performed

Examples of RUL estimation are illustrated in Figure 5In particular Figure 5(a) represents the case of continuousobservations and durationmodeled by aWeibull distributionwhile Figure 5(b) shows the case of discrete observationsand duration modeled by a Gamma distribution From thefigures one can notice that the average as well as the lowerand the upper bound estimations converges to the real RULas the failure time approaches Moreover as expected theuncertainty about the estimation decreases with the timesince predictions performed at an early stage are moreimprecise As a consequence the upper and the lower boundbecome more narrow as the failure state approaches and theestimation becomes more precise until it converges to theactual RUL value with the prediction error tending to zeroat the end of the evaluation

To quantitatively estimate the performance of our meth-odology for the RUL estimation we considered at each time119905 the absolute prediction error (APE) between the real RULand the predicted value defined as

APE (119905) = 1003816100381610038161003816RULreal (119905) minus RUL (119905)1003816100381610038161003816 (63)

where RULreal(119905) is the (known) value of the RUL at time119905 while RUL(119905) is RUL predicted by the model To evaluatethe overall performance of our methodology we consideredthe average absolute prediction error of the RUL estimationdefined as

APE =sum

119879

119905=1APE (119905)119879

(64)

where 119879 is the length of the testing signal APE being aprediction error average values of (64) close to zero corre-spond to good predictive performances

The results for each of the 10 testing cases and the differentHSMM configurations are reported in Tables 2(a) and 2(b)for the continuous and the discrete observation cases respec-tively As it can be noticed the online Remaining UsefulLifetime estimation and in general the online prediction ofthe time to a certain event can be effectively faced withHSMMs which achieve a reliable estimation power with asmall prediction error

Finally we tested our RUL estimationmethodology usingthe state duration estimation of (16) introduced byAzimi et al[30ndash32] The results are shown in Tables 3(a) and 3(b)in which respectively the prediction errors obtained forcontinuous and discrete observations are reported

Comparing Table 2 and Table 3 one can notice that theproposed RUL method outperforms the one of Azimi This

14 Mathematical Problems in Engineering

0 50 100 150 200 250 300 3500

50

100

150

200

250

300

350

400

Time

True RULUpper RUL

Average RULLower RUL

(a) Remaining Useful Lifetime estimation for continuous data andWeibull duration distribution

0 50 100 150 200 250 300 3500

50

100

150

200

250

300

350

400

Time

True RULUpper RUL

Average RULLower RUL

(b) Remaining Useful Lifetime estimation for discrete data andGamma duration distribution

Figure 5 HSMMs effectively solve RUL estimation problemsThe prediction converges to the actual RUL value and its uncertainty decreasesas the real failure time approaches

Table 1 State recognition accuracy

(a) Continuous observations

Test case Duration distributionGaussian Gamma Weibull

1 994 985 9922 997 986 9953 994 992 9974 989 989 9975 982 989 1006 991 988 9977 985 994 9978 992 991 9959 992 986 99710 992 991 995Average 991 989 996

(b) Discrete observations

Test case Duration distributionGaussian Gamma Weibull

1 974 967 9742 972 976 9653 994 958 9664 982 953 9775 991 974 9756 978 977 9787 958 972 9668 977 964 9729 989 972 98510 992 956 969Average 981 967 973

is mainly due to the proposed average state duration of (20)compared to the one of Azimi given by (16)

52 Real Data In this sectionwe apply the proposedHSMM-based approach for RUL estimation to a real case studyusing bearing monitoring data recorded during experimentscarried out on the Pronostia experimental platform andmadeavailable for the IEEE Prognostics and Health Management(PHM) 2012 challenge [49]The data correspond to normallydegraded bearings leading to cases which closely correspondto the industrial reality

The choice of testing the proposed methodology onbearings derives from two facts (i) bearings are the mostcritical components related to failures of rotating machines[50] and (ii) their monotonically increasing degradationpattern justifies the usage of left-right HSMMmodels

521 Data Description Pronostia is an experimental plat-form designed and realized at the Automatic Controland Micro-Mechatronic Systems (AS2M) Department ofFranche-Comte Electronics Mechanics Thermal Process-ing Optics-Science and Technology (FEMTO-ST) Institute(httpwwwfemto-stfr) (Besancon France) with the aimof collecting real data related to accelerated degradation ofbearings Such data are used to validate methods for bearingcondition assessment diagnostic and prognostic [19 51ndash59]

The Pronostia platform allows to perform run-to-failuretests under constant or variable operating conditions Theoperating conditions are determined by two factors thatcan be controlled online (i) the rotation speed and (ii)the radial force load During each experiment tempera-ture and vibration monitoring measurements are gatheredonline through two type of sensors placed in the bearing

Mathematical Problems in Engineering 15

Table 2 Average absolute prediction error (APE) of the RUL esti-mation using the proposed state duration estimator of (20)

(a) APE of the RUL estimation for the continuous observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 51 170 67 140 290 091 45 170 812 76 190 50 61 210 85 66 190 613 77 54 190 29 120 170 160 290 304 90 210 29 75 220 68 60 190 675 73 190 47 22 140 140 39 170 876 65 180 56 51 180 100 140 270 277 47 160 75 48 170 110 12 130 1208 100 220 29 52 180 100 92 220 399 31 92 140 20 160 130 82 210 4910 64 180 56 75 220 69 33 120 130Average 68 170 74 57 190 99 73 200 70

(b) APE of the RUL estimation for the discrete observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 21 110 140 31 88 140 24 120 1302 21 110 130 110 220 33 190 320 713 51 170 76 66 180 51 23 140 1104 59 65 180 52 170 67 42 160 905 32 140 100 83 190 34 120 240 296 120 240 27 62 180 52 41 84 1607 29 150 97 93 210 23 190 310 668 150 270 70 74 180 43 43 170 949 59 180 77 110 230 55 39 160 8810 35 110 140 55 60 160 52 170 71Average 57 150 100 74 170 66 77 190 90

housing a temperature probe and two accelerometers (oneon the vertical and one on the horizontal axis)

The platform is composed of three main parts a rotatingpart a load profile generation part and a measurement partas illustrated in Figure 6

The rotating part is composed of an asynchronous motorwhich develops a power equal to 250W two shafts and agearbox which allows the motor to reach its rated speed of2830 rpm The motorrsquos rotation speed and the direction areset through a human machine interface

The load profiles part issues a radial force on the externalring of the test bearing through a pneumatic jack connectedto a lever arm which indirectly transmits the load througha clamping ring The goal of the applied radial force is toaccelerate the bearingrsquos degradation process

Table 3 Average absolute prediction error (APE) of the RUL esti-mation using the state duration estimator of (16) introduced byAzimi et al [30ndash32]

(a) APE of the RUL estimation for the continuous observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 578 510 668 262 97 527 259 284 6462 502 444 577 213 170 469 290 192 7083 503 447 573 271 87 565 345 139 7344 518 460 604 213 143 459 349 171 7875 594 537 662 290 95 554 334 156 7496 580 517 671 258 83 541 231 258 6657 594 536 669 182 125 477 360 171 7448 634 556 723 194 157 441 348 178 7709 491 435 570 145 171 432 251 267 67010 544 484 628 232 79 527 241 245 674Average 554 493 635 226 121 499 301 206 715

(b) APE of the RUL estimation for the discrete observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 514 410 624 424 318 530 326 264 7362 496 399 604 595 483 708 313 276 6933 502 386 623 465 357 574 324 257 7024 422 315 538 501 405 606 237 361 6035 443 339 558 478 374 591 360 256 7656 522 432 627 552 443 669 272 316 6437 550 439 668 560 457 670 347 232 7448 503 390 620 604 505 710 351 264 7249 555 474 640 480 372 595 318 222 73610 490 382 607 521 412 631 294 289 687Average 500 397 611 518 413 629 314 274 704

Themeasurement part consists of a data acquisition cardconnected to the monitoring sensors which provides theuser with the measured temperature and vibration data Thevibration measurements are provided in snapshots of 01 scollected each 10 seconds at a sampling frequency of 256 kHz(2560 samples per each snapshot) while the temperature hasbeen continuously recorded at a sampling frequency of 10Hz(600 samples collected each minute)

Further details on the Pronostia test rig can be found onthe data presentation paper [49] and on the web page of thedata challenge (httpwwwfemto-stfrenResearch-depart-mentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-challengephp)

16 Mathematical Problems in Engineering

Table 4 Lifetime duration (in seconds) and operating conditions of the bearings tested in the IEEE PHM 2012 Prognostic Challenge dataset[49]

Condition 1 Condition 2 Condition 31800 rpm and 4000N 1650 rpm and 4200N 1500 rpm and 5000N

Bearing Lifetime [s] Bearing Lifetime [s] Bearing Lifetime [s]Bearing1 1 28030 Bearing2 1 9110 Bearing3 1 5150Bearing1 2 8710 Bearing2 2 7970 Bearing3 2 16370Bearing1 3 23750 Bearing2 3 19550 Bearing3 3 4340Bearing1 4 14280 Bearing2 4 7510Bearing1 5 24630 Bearing2 5 23110Bearing1 6 24480 Bearing2 6 7010Bearing1 7 22590 Bearing2 7 2300

Loadmodule

Testedbearing

Dataacquisitionmodule

Rotatingmodule

Figure 6 Global overview of the Pronostia experimental platform[19]

Regarding the data provided for the PHM 2012 challenge3 different operating conditions were considered

(i) first operating conditions speed of 1800 rpm and loadof 4000 Newton

(ii) second operating conditions speed of 1650 rpm andload of 4200 Newton

(iii) third operating conditions speed of 1500 rpm andload of 5000 Newton

Under the above operating conditions a total of 17 accel-erated life tests were realized on bearings of type NSK 6804DD which can operate at a maximum speed of 13000 rpmand a load limit of 4000N The tests were stopped whenthe amplitude of the vibration signal was higher than 20 gthus this moment was defined as the bearing failure time Anexample of bearing before and after the experiment is shownin Figure 7 together with the corresponding vibration signalcollected during the whole test

Table 4 reports how the 17 tested bearings were separatedinto the three operating conditions Moreover the durationof each experiment being the RUL to be predicted for eachbearing is also given We performed two sets of experimentsby considering respectively the bearings relative to thefirst and the second operating condition (ie Bearing1 1Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

As already mentioned the available data correspondto normally degraded bearings meaning that the defects

were not initially induced and that each degraded bearingcontains almost all the types of defects (balls rings and cage)resembling faithfully a common real industrial situationMoreover no assumption about the type of failure to beoccurred is provided with the data and since the variabilityin experiment durations is high (from 1 h to 7 h) performinggood estimates of the RUL is a difficult task [49]

In our experiments we considered as input to our modelthe horizontal channel of the accelerometerWe preprocessedthe raw signals by extracting two time-domain features thatis root mean square (RMS) and kurtosis within windowsof the same length as the given snapshots (119871 = 2560) Let119903119908(119905) be the raw signal of the 119908th window for each 119908 we

estimate RMS as 119909RMS119908

= radic(1119871)sum119871

119905=11199032119908(119905) and kurtosis as

119909KURT119908

= ((1119871)sum119871

119905=1(119903

119908(119905) minus 119903

119908)4)((1119871)sum

119871

119905=1(119903

119908(119905) minus 119903

119908)2)2

where 119903119908is the mean of 119903

119908 An example of feature extraction

for Bearing1 1 is shown in Figure 8To assess the performance of the proposed HSMM after

the model selection procedure we implemented a leave-one-out cross validation scheme by considering separatelyconditions 1 and 2 we performed the online RUL estimationfor each of the 7 bearings using an HSMM trained with theremaining 6 bearing histories Similarly to the simulated casewe considered the average absolute prediction error definedin (64) to quantitatively evaluate our method

522 Bearings RUL Estimation We performed our exper-iments in two steps firstly we applied model selection inorder to determine an optimalmodel structure and secondlywe estimated the RUL of the bearings The full procedure isdetailed in the following

(A) HSMM Structure To determine an appropriate HSMMstructure for effectively modeling the considered data weconsidered several HSMM structures characterized by (i)the duration distribution family (being Gaussian Gamma orWeibull) (ii) an increasing number of states119873 from 2 to 6and (iii) an increasing number of Gaussian mixtures 119872 inthe observation density from 1 to 4 For eachmodel structureobtained by systematically considering all the combinationsof (i) to (iii) we run 120 parameter learnings correspondingto 120 random initializations 1205820 on the data sets (Bearing1 1

Mathematical Problems in Engineering 17

times106

0 1 2 3 4 5 6 7

50

40

30

20

10

0

minus10

minus20

minus30

minus40

minus50

Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]

(b)(a)

0 280300

7

Extracted feature RMS

Time (s)

0 280300

60

Time (s)

Extracted feature kurtosis

Featureextraction

Win

dow

1

Win

dow

2

Win

dow

3

Win

dow

Win

dow

r(t)

x1 x2 x3

middot middot middot

middot middot middot

nminus1

n

xnminus1 xn

Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1

Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density

(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration

Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast

119894were estimated for the 119894th testing bear-

ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878

4as the failure state The same proce-

dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-

ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as

18 Mathematical Problems in Engineering

2 3 4 5 60

005

01

015

02

025

03

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(a) AIC values for Condition 1

2 3 4 5 60

01

02

03

04

05

06

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(b) AIC values for Condition 2

Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density

0 22590

22590

45000

Time (s)

True RULUpper RUL

Average RULLower RUL

(a) RUL estimation for Bearing1 7

True RULUpper RUL

Average RULLower RUL

0 70100

7010

19000

Time (s)

(b) RUL estimation for Bearing2 6

Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings

well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time

Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors

of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2

6 Conclusion and Future Work

In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation

The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required

Mathematical Problems in Engineering 19

Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds

(a) Condition 1

Test Bearings APEavg APElow APEup

Bearing1 1 105716 127230 94146Bearing1 2 43312 38156 38213Bearing1 3 29970 97309 60912Bearing1 4 63363 28766 148719Bearing1 5 19689 74484 104115Bearing1 6 42530 98964 97937Bearing1 7 13880 74943 100881Average 45494 77122 92132

(b) Condition 2

Test Bearings APEavg APElow APEup

Bearing2 1 24759 50065 72875Bearing2 2 16473 44972 82886Bearing2 3 88771 95083 79621Bearing2 4 17698 42486 49825Bearing2 5 86631 104900 107300Bearing2 6 8771 35047 66870Bearing2 7 30125 38664 66519Average 39033 58745 75128

parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology

This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications

As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension

Appendix

In this appendix we give the derivation of the state durationvariable introduced in (20) as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(119883

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A1)

The random variable 119889119905(119894) has been defined in Section 21

as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889

119905(119894) is sampled from an

arbitrary distribution

119889119905(119894) sim 119891 (119889) (A2)

We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as

P (119889119905(119894) = 119889) = P (119904

119905minus119889minus1= 119878

119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905 120582)

(A3)

We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889

119905(119894) of 119889

119905(119894) defined as its

expected value (see Equation (15))

119889119905(119894) = E (119889

119905(119894) | 119904

119905= 119878

119894 x

1x2sdot sdot sdot x

119905) 1 le 119894 le 119873 (A4)

From the definition of expectation we have

119889119905(119894) =

119905

sum

119889=1

119889 sdot P (119889119905(119894) = 119889)

=

119905

sum

119889=1

119889 sdot P (119904119905minus119889minus1

= 119878119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

(A5)

For 119889119905+1

(119894) we have

119889119905+1

(119894) =

119905+1

sum

119889=1

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119886)

(A6)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

(A7)

By noticing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

= (P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1))

sdot (P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1))

minus1

(A8)

20 Mathematical Problems in Engineering

we can replace the probability of the second termof (A7)with

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119887)

(A9)

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

(A10)

In the last factor of (A10) we can omit the information aboutthe current state and observation by observing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

asymp P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894| 119904

119905= 119878

119894 x

1 x

119905)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119888)

(A11)

if the following independencies hold

119904119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

119883119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

(A12)

wherewithperpwe denote independency Equation (A12) holdsforHMMs (evenwithout conditioning on x

1 x

119905) but they

do not hold for HSMMs since the state duration (expressedby 119904

119905minus119889+1 119904

119905minus1) determines the system evolution On

the other hand state duration is partially known by theobservtions x

1 x

119905Thus the approximation is reasonable

as long as the uncertainty on the states is within limitsFrom (A6) (A9) and (A11) we obtain

119889119905+1

(119894) = (119886) +

119905+1

sum

119889=2

119889 sdot (119887) sdot (119888)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

P(119860119861|119862)=P(119860|119861119862)sdotP(119861|119862)

+

119905+1

sum

119889=2

119889 sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

it does not depend on 119889

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

+ P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [

[

P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894119904

119905+1= 119878

119894 x

1 x

119905x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

for the approximation of (A11)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

]

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) +

119905

sum

1198891015840=1

(1198891015840+ 1)

sdot P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

(A13)

Noticing that

119905

sum

1198891015840=1

P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

+ P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) = 1

(A14)

because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows

119889119905+1

(119894) = P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1) sdot (119889

119905(119894) + 1)

(A15)

The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step

In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889

119905+1(119894)

we can consider the following equality

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=P (119904

119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

P (119904119905+1

= 119878119894| x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

120574119905+1

(119894)

(A16)

If we consider the terms involved in the probability at thenumerator of the right-hand side of (A16) we have that

x1 x

119905⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119861

perp x119905+1⏟⏟⏟⏟⏟⏟⏟

119862

| 119904119905= 119878

119894 119904

119905+1= 119878

119894⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119860

(A17)

Mathematical Problems in Engineering 21

If 119861 perp 119862 | 119860 for the Bayes rule we have that

P (119860 | 119862 119861) =P (119862 | 119860119861) sdot P (119860 | 119861)

P (119862 | 119861) (A18)

Hence we can rewrite the numerator of the right-hand sideof (A16) as follows

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

= (P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905)

sdot P(x119905+1

|

x119905+1

perp119904119905|119904119905+1

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞119904119905= 119878

119894 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

= (P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

sdot

120574119905(119894)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (119904

119905= 119878

119894| x

1 x

119905) sdot

119887119894(x119905+1

)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (x

119905+1| 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

(A19)

The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as

P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

= sum

119889119905

119886119894119894(d

119905) sdot P (119889

119905| x

1 x

119905)

asymp 119886119894119894(d

119905)

(A20)

while the denominator of (A19) can be expressed as follows

P (x119905+1

| x1 x

119905) =

P (x1 x

119905 x

119905+1)

P (x1 x

119905)

=sum

119873

119894=1120572119905+1

(119894)

sum119873

119894=1120572119905(119894)

(A21)

By substituting (A20) and (A21) in (A19) we obtain

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

sum119873

119894=1120572119905+1

(119894)

(A22)

and then by combining (A22) and (A16) we obtain

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

120574119905+1

(119894) sum119873

119894=1120572119905+1

(119894)

(A23)

Finally by substituting (A23) in (A15) and considering that

120574119905(119894) =

120572119905(119894)

sum119873

119894=1120572119905(119894)

(A24)

we derive the induction formula for 119889119905+1

(119894) in terms of modelparameters as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(x

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A25)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989

[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004

[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996

[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002

[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005

[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008

[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309

[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en

[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003

[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002

[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven

22 Mathematical Problems in Engineering

methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012

[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008

[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989

[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031

[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010

[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010

[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011

[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012

[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012

[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980

[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk

[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007

[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005

[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003

[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997

[26] S-Z Yu ldquoHidden semi-Markov modelsrdquo Artificial Intelligencevol 174 no 2 pp 215ndash243 2010

[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003

[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006

[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014

[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003

[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008

[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005

[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999

[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007

[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011

[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001

[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002

[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007

[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997

[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008

[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005

[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997

Mathematical Problems in Engineering 23

[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002

[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986

[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993

[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006

[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973

[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977

[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012

[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985

[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015

[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013

[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013

[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013

[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013

[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012

[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012

[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model

and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013

[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Hidden Semi-Markov Models for Predictive Maintenance

6 Mathematical Problems in Engineering

A simple example shows that (18) is incorrect assum-ing an HSMM with three states and considering the statesequence (119878

1 119878

1 119878

2 ) the correct sequence of the duration

vector is d1= [1 1 1]

119879 d2= [2 1 1]

119879 and d3= [1 1 1]

119879where the superscript 119879 denotes vector transpose If we apply(18) we obtain d

1= [1 1 1]

119879 d2= [2 1 1]

119879 and d3=

[1 2 1]119879 which is in contradiction with the definition of the

state duration vector given in (2)To calculate the average state duration variable 119889

119905(119894) we

propose a new induction formula that estimates for each time119905 the time spent in the 119894th state prior to 119905 as

119889119905(119894) = P (119904

119905minus1= 119878

119894| 119904

119905= 119878

119894 x

1 x

119905) sdot (119889

119905minus1(119894) + 1) (19)

=119886119894119894(d

119905minus1) sdot 120572

119905minus1(119894) sdot 119887

119894(x

119905)

120572119905(119894)

sdot (119889119905minus1

(119894) + 1)

1 le 119894 le 119873

(20)

The derivation of (20) is given in Appendix The intuitionbehind (19) is that the current average duration is the previousaverage duration plus one weighted with the ldquoamountrdquo of thecurrent state that was already in state 119878

119894in the previous step

Using the proposed (20) the forward algorithm can bespecified as follows

(1) initialization with 1 le 119894 le 119873

1205721(119894) = 120587

119894119887119894(x

1)

1198891(119894) = 1

Ad1

= P (d1) + (I minus P (d

1))A0

(21)

where P(d119894) is estimated using (6)

(2) induction with 1 le 119895 le 119873 and 1 le 119905 le 119879 minus 1

120572119905+1

(119895) = [

119873

sum

119894=1

120572119905(119894) 119886

119894119895(d

119905)] 119887

119895(x

119905+1) (22)

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(x

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (23)

Ad119905+1

= P (d119905+1

) + (I minus P (d119905+1

))A0 (24)

where 119886119894119895(d

119905) are the coefficients of the matrix Ad

119905

(3) termination

P (x | 120582) =119873

sum

119894=1

120572119879(119894) (25)

Similar considerations as the forward procedure can bemade for the backward algorithm which is implemented bydefining the variable 120573

119905(119894) as

120573119905(119894) = P (x

119905+1x119905+2

sdot sdot sdot x119879| 119904

119905= 119878

119894 120582) 1 le 119894 le 119873 (26)

Having estimated the dynamic transition matrix Ad119905

foreach 1 le 119905 le 119879 using (24) the backward variable can becalculated inductively as follows

(1) Initialization

120573119879(119894) = 1 1 le 119894 le 119873 (27)

(2) Induction

120573119905(119894) =

119873

sum

119895=1

119886119894119895(d

119905) 119887

119895(x

119905+1) 120573

119905+1(119895)

119905 = 119879 minus 1 119879 minus 2 1 1 le 119894 le 119873

(28)

Although the variable 120573119905(119894) is not necessary for the

calculation of the model likelihood it will be useful in theparameter reestimation procedure as it will be explained inSection 223

222 The Viterbi Algorithm The Viterbi algorithm [46 47](also known as decoding) allows determining the best statesequence corresponding to a given observation sequence

Formally given a sequence of observation x = x1x2sdot sdot sdot x

119879

the best state sequence 119878lowast = 119904lowast

1119904lowast

2sdot sdot sdot 119904

lowast

119879corresponding to x is

calculated by defining the variable 120575119905(119894) as

120575119905(119894) = max

11990411199042119904119905minus1

P (11990411199042 119904

119905= 119878

119894 x

1x2sdot sdot sdot x

119905| 120582) (29)

The procedure to recursively calculate the variable 120575119905(119894)

and to retrieve the target state sequence (ie the argumentswhich maximize the 120575

119905(119894)rsquos) for the proposed HSMM is a

straightforward extension of theViterbi algorithm forHMMs[13]The only change is the usage in the recursive calculationof 120575

119905(119894) of the dynamic transition matrix Ad

119905

= [119886119894119895(d

119905)]

calculated through (24) The Viterbi algorithm for the intro-duced parametric HSMMs can be summarized as follows

(1) initialization with 1 le 119894 le 119873

1205751(119894) = 120587

119894119887119894(x

1)

1205951(119894) = 0

(30)

(2) recursion with 1 le 119895 le 119873 and 2 le 119905 le 119879

120575119905(119895) = max

1le119894le119873

[120575119905minus1

(119894) 119886119894119895(d

119905)] 119887

119895(x

119905) (31)

120595119905(119895) = argmax

1le119894le119873

[120575119905minus1

(119894) 119886119894119895(d

119905)] (32)

(3) termination

119875lowast= max

1le119894le119873

[120575119879(119894)] (33)

119904lowast

119879= argmax

1le119894le119873

[120575119879(119894)] (34)

where we keep track of the argument maximizing (31) usingthe vector 120595

119905 which tracked back gives the desired best

state sequence

119904lowast

119905= 120595

119905+1(119904

lowast

119905+1) 119905 = 119879 minus 1 119879 minus 2 1 (35)

Mathematical Problems in Engineering 7

223 The Training Algorithm The training algorithm con-sists of estimating themodel parameters from the observationdata As discussed in Section 21 a parametric HSMMis defined by 120582 = (A0

Θ 119862 120583 119880 120587) if the observationsare continuous or 120582 = (A0

Θ 119861 120587) if the observationsare discrete Given a generic observation sequence x =

x1x2sdot sdot sdot x

119879 referred to as training set in the following the

training procedure consists of finding the model parameterset120582lowast which locallymaximizes themodel likelihoodP(x | 120582)

We use the modified Baum-Welch algorithm of Azimiet al [30ndash32] However in our implementation we do notmake assumption on the density function used to model thestate duration and we consider both continuous and discreteobservations

Being a variant of the more general Expectation-Max-imization (EM) algorithm Baum-Welch is an iterative proce-dure which consists of two steps (i) the expectation step inwhich the forward and backward variables are calculated andthe model likelihood is estimated and (ii) the maximizationstep in which the model parameters are updated and usedin the next iteration This process usually starts from arandom guess of the model parameters 1205820 and it is iterateduntil the likelihood function does not improve between twoconsecutive iterations

Similarly to HMMs the reestimation formulas arederived by firstly introducing the variable 120585

119905(119894 119895) which

represents the probability of being in state 119878119894at time 119905 and

in state 119878119895at time 119905 + 1 given the model and the observation

sequence as

120585119905(119894 119895) = P (119904

119905= 119878

119894 119904

119905+1= 119878

119895| x 120582) (36)

However in the HSMM case the variable 120585119905(119894 119895) considers

the duration estimation performed in the forward algorithm(see Equation (24)) Formulated in terms of the forward andbackward variables it is given by

120585119905(119894 119895) = P (119904

119905= 119878

119894 119904

119905+1= 119878

119895| x 120582)

=P (119904

119905= 119878

119894 119904

119905+1= 119878

119895 x | 120582)

P (x | 120582)

=120572119905(119894) 119886

119894119895(d

119905) 119887

119895(x

119905+1) 120573

119905+1(119895)

P (x | 120582)

=120572119905(119894) 119886

119894119895(d

119905) 119887

119895(x

119905+1) 120573

119905+1(119895)

sum119873

119894=1sum

119873

119895=1120572119905(119894) 119886

119894119895(d

119905) 119887

119895(x

119905+1) 120573

119905+1(119895)

(37)

From 120585119905(119894 119895) we can derive the quantity 120574

119905(119894) (already

defined in (17)) representing the probability of being in state119878119894at time 119905 given the observation sequence and the model

parameters

120574119905(119894) =

119873

sum

119895=1

120585119905(119894 119895) (38)

Finally the the reestimation formulas for the parameters120587 and A0 are given by

120587119894= 120574

1(119894) (39)

1198860

119894119895=

(sum119879minus1

119905=1120585119905(119894 119895)) ⊙ 119866

sum119873

119895=1(sum

119879minus1

119905=1120585119905(119894 119895)) ⊙ 119866

(40)

where119866 = [119892119894119895] is a squarematrix of dimensions119873times119873where

119892119894119895= 0 for 119894 = 119895 and 119892

119894119895= 1 for 119894 = 119895 ⊙ represents the element

by element product between two matrices sum119879minus1

119905=1120574119905(119894) is the

expected number of transitions from state 119878119894 andsum119879minus1

119905=1120585119905(119894 119895)

is the expected number of transitions from state 119878119894to state 119878

119895

Equation (39) represents the expected number of timesthat the model starts in state 119878

119894 while (40) represents the

expected number of transitions from state 119878119894to state 119878

119895with

119894 = 119895 over the total expected number of transitions from state119878119894to any other state different from 119878

119894

For the matrix A0 being normalized the stochasticconstraints are satisfied at each iteration that is sum119873

119895=11198860

119894119895= 1

for each 1 le 119894 le 119873 while the estimation of the priorprobability 120587

119894inherently sums up to 1 at each iteration since

it represents the expected frequency in state 119878119894at time 119905 = 1

for each 1 le 119894 le 119873With respect to the reestimation of the state duration

parameters Θ firstly we estimate the mean 120583119894119889

and thevariance 1205902

119894119889of the 119894th state duration for each 1 le 119894 le 119873

from the forward and backward variables and the estimationof the state duration variable

120583119894119889

=sum

119879minus1

119905=1120572119905(119894) (sum

119873

119895=1119895 =119894119886119894119895(119889

119905(119894)) 119887

119895(x

119905+1) 120573

119905+1(119895)) 119889

119905(119894)

sum119879minus1

119905=1120572119905(119894) (sum

119873

119895=1119895 =119894119886119894119895(119889

119905(119894)) 119887

119895(x

119905+1) 120573

119905+1(119895))

(41)

1205902

119894119889= (

119879minus1

sum

119905=1

120572119905(119894)(

119873

sum

119895=1119895 =119894

119886119894119895(119889

119905(119894)) 119887

119895(x

119905+1) 120573

119905+1(119895))

sdot (119889119905(119894) minus 120583

119894119889)2

)

sdot (

119879minus1

sum

119905=1

120572119905(119894)(

119873

sum

119895=1119895 =119894

119886119894119895(119889

119905(119894)) 119887

119895(x

119905+1) 120573

119905+1(119895)))

minus1

(42)

where (41) can be interpreted as the probability of transitionfrom state 119878

119894to 119878

119895with 119894 = 119895 at time 119905weighted by the duration

of state 119878119894at 119905 giving the desired expected value while in (42)

the same quantity is weighted by the squared distance of theduration at time 119905 from its mean giving the estimation of thevariance

Then the parameters of the desired duration distributioncan be estimated from 120583

119894119889and 1205902

119894119889 For example if a Gamma

distribution with shape parameter ] and scale parameter 120578 ischosen to model the state duration the parameters ]

119894and 120578

119894

for each 1 le 119894 le 119873 can be calculated as ]119894= 120583

2

119894119889120590

2

119894119889and

120578119894= 120590

2

119894119889120583

119894119889

8 Mathematical Problems in Engineering

Concerning the observation parameters once the modi-fied forward and backward variables accounting for the stateduration are defined as in (22) and (28) the reestimationformulas are the same as for Hidden Markov Models [13]

In particular for continuous observations the parametersof the Gaussiansrsquo mixture defined in (9) are reestimated byfirstly defining the probability of being in state 119878

119895at time 119905

with the probability of the observation vector x119905evaluated by

the 119896th mixture component as

120574119905(119895 119896) = [

[

120572119905(119895) 120573

119905(119895)

sum119873

119895=1120572119905(119895) 120573

119905(119895)

]

]

sdot [

[

119888119895119896N (x

119905120583

119895119896U

119895119896)

sum119872

119898=1119888119895119898N (x

119905120583

119895119898U

119895119898)

]

]

(43)

By using the former quantity the parameters 119888119895119896 120583

119895119896 andU

119895119896

are reestimated through the following formulas

119888119895119896=

sum119879

119905=1120574119905(119895 119896)

sum119879

119905=1sum

119872

119898=1120574119905(119895 119898)

120583119895119896=sum

119879

119905=1120574119905(119895 119896) sdot x

119905

sum119879

119905=1120574119905(119895 119896)

U119895119896=sum

119879

119905=1120574119905(119895 119896) sdot (x

119905minus 120583

119895119896) (x

119905minus 120583

119895119896)119879

sum119879

119905=1120574119905(119895 119896)

(44)

where superscript 119879 denotes vector transposeFor discrete observations the reestimation formula for

the observation matrix 119887119895(119897) is

119887119895(119897) =

sum119879

119905=1with119909119905=119883119897

120574119905(119895)

sum119879

119905=1120574119905(119895)

(45)

where the quantity 120574119905(119895) which takes into account the dura-

tion dependent forward variable 120572119905(119895) is calculated through

(17)The reader is referred to Rabinerrsquos work [13] for the inter-

pretation on the observation parameters reestimation formu-las

3 AIC-Based Model Selection

In the framework of the proposed parametric HSMMs themodel selection procedure aims to select the optimal numberof hidden states119873 the right duration distribution family andin the case of mixture observation modeling the number ofGaussian mixtures 119872 to be used In this work we make useof the Akaike Information Criterion (AIC) Indeed it hasbeen seen that in case of complex models and in presence ofa limited number of training observations AIC represents asatisfactorymethodology formodel selection outperformingother approaches like Bayesian Information Criterion

In general information criteria are represented as a two-term structure They account for a compromise between

a measure of model fitness which is based on the likelihoodof the model and a penalty term which takes into accountthe model complexity Usually the model complexity ismeasured in terms of the number of parameters that have tobe estimated and in terms of the number of observations

The Akaike Information Criterion is an estimate of theasymptotic value of the expected distance between theunknown true likelihood function of the data and the fittedlikelihood function of the model In particular the AIC canbe expressed as

AIC =minus log 119871 () + 119901

119879 (46)

where 119871() is the likelihood of the model with the estimatedparameters as defined in (25) 119901 is the number of modelparameters and119879 is the length of the observed sequenceThebest model is the one minimizing equation (46)

Concerning 119901 the number of parameters to be estimatedfor a parametric HSMM with119873 states is 119901 = 119901

ℎ+ 119901

119900 where

119901ℎare the parameters of the hidden states layer while 119901

119900are

those of the observation layerIn particular 119901

ℎ= (119873 minus 1) + (119873 minus 1) sdot 119873 + 119911 sdot 119873 where

(i) 119873 minus 1 accounts for the prior probabilities 120587(ii) (119873 minus 1) sdot 119873 accounts for the nonrecurrent transition

matrix A0(iii) 119911sdot119873 accounts for the duration probability being 119911 the

number of parameters 120579 of the duration distribution

Concerning 119901119900 a distinction must be made between

discrete and continuous observations

(i) in the case of discrete observations with 119871 possibleobservable values 119901

119900= (119871 minus 1) sdot 119873 which accounts

for the elements of the observation matrix 119861(ii) if the observations are continuous and a multivariate

mixture of 119872 Gaussians with 119874 variates is used asobservationmodel 119901

119900= [119874 sdot119873sdot119872]+[119874 sdot119874 sdot119873sdot119872]+

[(119872minus 1) sdot 119873] where each term accounts respectivelyfor the mean vector 120583 the covariance matrix 119880 andthe mixture coefficients 119862

4 Remaining Useful Lifetime Estimation

One of the most important advantages of the time modelingof HSMMs is the possibility to effectively face the predictionproblem The knowledge of the state duration distributionsallows the estimation of the remaining time in a certain stateand in general the prediction of the expected time119863 beforeentering in a determinate state

As already mentioned an interesting application of theprediction problem is the Remaining Useful Lifetime (RUL)estimation of industrial equipments Indeed if each stateof an HSMM is mapped to a different condition of anindustrial machine and if the state 119878

119896that represents the

failure condition is identified at each moment the RUL canbe defined as the expected time 119863 to reach the failure state

Mathematical Problems in Engineering 9

119878119896 If we assume that the time to failure is a random variable

119863 following a determinate probability density we define theRUL at the current time 119905 as

RUL119905= 119863 = E (119863) 119904

119905+= 119878

119896 119904

119905+minus1= 119878

119894

1 le 119894 119896 le 119873 119894 = 119896

(47)

where E denotes the expected valueHaving fixed the failure state the estimation of the RUL

is performed in two steps every time a new observation isacquired (online)

(1) estimation of the current state(2) projection of the future state transitions until the

failure state is reached and estimation of the expectedsojourn time

The estimation of the current state is performed via theViterbi path that is the variable 120575

119905= [120575

119905(119894)]

1le119894le119873defined in

(29) To correctly model the uncertainty of the current stateestimation we use the normalized variable 120575

119905(119894) obtained as

120575119905(119894) = max

11990411199042119904119905minus1

P (119904119905= 119878

119894| 119904

11199042sdot sdot sdot 119904

119905minus1 x

1x2sdot sdot sdot x

119905 120582)

=120575119905(119894)

sum119873

119895=1120575119905(119895)

1 le 119894 le 119873

(48)

that is an estimate of the probability of being in state 119878119894at

time 119905Together with the normalized variable 120575

119905(119894) the maxi-

mum a posteriori estimate of the current state 119904lowast

119905is taken

into account according to (34) If 119904lowast119905coincides with the failure

state the desired event is detected by the model and the timeto this event is obviously zero Otherwise an estimation ofthe average remaining time in the current state 119889avg(119904

lowast

119905) is

calculated as

119889avg (119904lowast

119905) =

119873

sum

119894=1

(120583119889119894

minus 119889119905(119894)) ⊙ 120575

119905(119894) (49)

where with 120583119889119894

we denote the expected value of the durationvariable in state 119878

119894according to the duration distribution

specified by the parameters 120579119894 Equation (49) thus estimates

the remaining time in the current state by subtracting theestimated states duration 119889

119905(119894) at time 119905 from the expected

sojourn time of state 119878119894 and weighting the result using the

uncertainty about the current state 120575119905(119894) and finally by

summing up all the contributions from each stateIn addition to the average remaining time a lower and an

upper bound value can be calculated based on the standarddeviation 120590

119889119894

of the duration distribution for state 119878119894

119889low (119904lowast

119905) =

119873

sum

119894=1

(120583119889119894

minus 120590119889119894

minus 119889119905(119894)) ⊙ 120575

119905(119894) (50)

119889up (119904lowast

119905) =

119873

sum

119894=1

(120583119889119894

+ 120590119889119894

minus 119889119905(119894)) ⊙ 120575

119905(119894) (51)

Once the remaining time in the current state is estimatedthe probability of the next state is calculated by multiplyingthe transpose of the nonrecurrent transition matrix by thecurrent state probability estimation as follows

120575next = [120575119905+119889

(119894)]1le119894le119873

= (A0)119879

sdot 120575119905 (52)

while themaximumaposteriori estimate of the next state 119904lowastnextis calculated as

119904lowast

next = 119904lowast

119905+119889= argmax

1le119894le119873

120575119905+119889

(119894) (53)

Again if 119904lowast119905+119889

coincides with the failure state the failurewill happen after the remaining time at the current state isover and the average estimation of the failure time is 119863avg =

119889avg(119904lowast

119905) calculated at the previous step with the bound values

119863low = 119889low(119904lowast

119905) and119863up = 119889up(119904

lowast

119905) Otherwise the estimation

of the sojourn time of the next state is calculated as follows

119889avg (119904lowast

119905+119889) =

119873

sum

119894=1

120583119889119894

⊙ 120575119905+119889

(119894) (54)

119889low (119904lowast

119905+119889) =

119873

sum

119894=1

(120583119889119894

minus 120590119889119894

) ⊙ 120575119905+119889

(119894) (55)

119889up (119904lowast

119905+119889) =

119873

sum

119894=1

(120583119889119894

+ 120590119889119894

) ⊙ 120575119905+119889

(119894) (56)

This procedure is repeated until the failure state isencountered in the prediction of the next state The calcula-tion of the RUL is then simply obtained by summing all theestimated remaining time in each intermediate state beforeencountering the failure state

119863avg = sum119889avg (57)

119863low = sum119889low (58)

119863up = sum119889up (59)

Finally Algorithm 1 details the above described RUL esti-mation procedure

5 Experimental Results

To demonstrate the effectiveness of the proposed HSMMmodels we make use of a series of experiments performedboth on simulated and real data

The simulated data were generated by considering a left-right HSMM and adapting the parameters of the artificialexample reported in the work of Lee et al [15] The real casedata are monitoring data related to the entire operationallife of bearings made available for the IEEE PHM 2012 datachallenge (httpwwwfemto-stfrenResearch-departmentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-chal-lengephp)

10 Mathematical Problems in Engineering

(1) function RulEstimation(x119905 119878

119896) ⊳ x

119905 The last observation acquired

(2) ⊳ 119878119896 The failure state

(3) Initialization(4) 119863avg larr 0

(5) 119863low larr 0

(6) 119863up larr 0

(7) Current state estimation(8) Calculate 120575

119905⊳ Using (48)

(9) Calculate 119904lowast119905

⊳ Using (34)(10) Calculate d

119905⊳ Using (20)

(11) 119878 larr 119904lowast

119905

(12) Loop(13) while 119878 = 119878

119896do

(14) Calculate 119889avg ⊳ Using (49) or (54)(15) Calculate 119889low ⊳ Using (50) or (55)(16) Calculate 119889up ⊳ Using (51) or (56)(17) 119863avg larr 119863avg + 119889avg

(18) 119863low larr 119863low + 119889low(19) 119863up larr 119863up + 119889up(20) Calculate 120575next ⊳ Using (52)(21) Calculate 119904lowastnext ⊳ Using (53)(22) 119878 larr 119904

lowast

nextend

(23) return 119863avg119863low119863up

Algorithm 1 Remaining Useful Lifetime estimation (Pseudo-Code)

51 Simulated Experiment Data have been generated withthe idea of simulating the behavior of an industrial machinethat during its functioning experiences several degradationmodalities until a failure state is reached at the end of itslifetimeThe generated data are used to test the performancesof our methodology for (i) automatic model selection (ii)online condition monitoring and (iii) Remaining UsefulLifetime estimation considering both continuous and dis-crete observations

511 Data Generation The industrial machine subject ofthese experiments has been modeled as a left-right paramet-ric HSMM with 119873 = 5 states having state 119878

5as absorbing

(failure) stateThe choice of a left-right setting has beenmadefor simplicity reasons since the primary goal of this work is todemonstrate that the proposed model specification coupledwith the Akaike Information Criterion is effective to solveautomatic model selection online condition monitoringand prediction problems At this purpose we divided theexperiments in two cases according to the nature of theobservation being continuous or discrete

For each of the continuous and the discrete cases threedata sets have been generated by considering the followingduration models Gaussian Gamma and Weibull densitiesFor each of the three data sets 30 series of data are usedas training set and 10 series as testing set Each time seriescontains 119879 = 650 observations The parameters used togenerate the data are taken from the work of Lee et al [15]

and are adapted to obtain an equivalent left-right parametricHSMM as follows

120587 =

[[[[[

[

1

0

0

0

0

]]]]]

]

A0=

[[[[[

[

0 1 0 0 0

0 0 1 0 0

0 0 0 1 0

0 0 0 0 1

0 0 0 0 1

]]]]]

]

ΘN = 1205791= [100 20] 120579

2= [90 15] 120579

3= [100 20]

1205794= [80 25] 120579

5= [200 1]

ΘG = 1205791= [500 02] 120579

2= [540 01667]

1205793= [500 02] 120579

4= [256 03125]

1205795= [800 0005]

ΘW = 1205791= [102 28] 120579

2= [92 29] 120579

3= [102 28]

1205794= [82 20] 120579

5= [200 256]

(60)

whereΘNΘG andΘW are the different distribution param-eters used for the Gaussian Gamma and Weibull dura-tion models respectively More precisely they represent thevalues of the mean 120583

119889and the the variance 120590

2

119889of the

Gaussian distribution the shape ]119889and the scale 120578

119889of the

Gamma distribution and the scale 119886119889and the shape 119887

119889of

Mathematical Problems in Engineering 11

0 100 200 300 400 500 600

12345

Hidden states sequence

0 100 200 300 400 500 6000

100

200

300 State duration

0 100 200 300 400 500 6000

20

40

60Observed signal

(a) Example of simulated data for the continuous case

0 100 200 300 400 500 600

12345

Hidden states sequence

0 100 200 300 400 500 6000

100

200

300 State duration

0 100 200 300 400 500 600

132

54

76

Observed symbols

(b) Example of simulated data for the discrete case

Figure 2 The data generated with the parameter described in Section 511 both for the continuous and the discrete case

theWeibull distribution It must be noticed that as explainedin Section 21 for state 119878

5 being the absorbing state the

duration parameters 1205795have no influence on the data since

once the state 1198785is reached the system will remain there

foreverConcerning the continuous observation modeling a

bivariate Gaussian distribution has been used with the fol-lowing parameters [15]

1205831= [

20

20] 120583

2= [

20

35] 120583

3= [

35

35]

1205835= [

28

28]

1198801= [

20 0

0 20] 119880

2= [

15 0

0 15] 119880

3= [

15 minus2

minus2 15]

1198804= [

5 0

0 5] 119880

5= [

10 3

3 10]

(61)

while for the discrete case 119871 = 7 distinct observation sym-bols have been taken into consideration with the followingobservation probability distribution

119861 =

[[[[[

[

08 02 0 0 0 0 0

01 08 01 0 0 0 0

0 01 08 01 0 0 0

0 0 01 07 01 01 0

0 0 0 02 06 01 01

]]]]]

]

(62)

An example of simulated data both for the continuous and thediscrete cases is shown in Figure 2 where aGaussian durationmodel has been used

512 Training and Model Selection The goal of this experi-mental phase is to test the effectiveness of the AIC in solvingthe automatic model selection For this purpose the trainingsets of the 6 data sets (continuousdiscrte observation withGaussian Gamma and Weibull duration models) have beentaken individually and for each one of them a series of

learning procedure has been run each one with a variableHSMM structure In particular we took into account all thecombinations of the duration distribution families (GaussianGamma and Weibull) an increasing number of states 119873from 2 to 8 and for the continuous observation cases anincreasing number of Gaussian mixtures119872 in the observa-tion distribution from 1 to 4

As accurate parameter initialization is crucial for obtain-ing a good model fitting [14] a series of 40 learning proce-dures corresponding to 40 random initializations of the initialparameters 1205820 have been executed for each of the consideredHSMM structures For each model structure the AIC valueas defined in (46) has been evaluatedThe final trained set ofparameters 120582lowast corresponding to the minimumAIC value hasbeen retained resulting in 7 HSMMs with a number of statesfrom 2 to 8

The obtained results are shown in Figure 3 for boththe continuous and discrete observation data As it can benoticed for all the 6 test cases of Figure 3 the AIC valuesdo not improve much for a number of states higher than 5meaning that adding more states does not add considerableinformation to the HSMM modeling power Hence 5 statescan be considered as an optimal number of states Moreoveras shown in the zoomed sections of Figure 3 for the HSMMswith 5 states the minimum AIC values are obtained for theduration distributions corresponding to the ones used togenerate the data As a consequence AIC can be considered asan effective approach to performmodel selection for HSMMas well as selecting the appropriate parametric distributionfamily for the state duration modeling

513 Condition Monitoring The optimal parameters 120582lowast

obtained in the previous phase have been used to performthe online conditionmonitoring experiment on the 10 testingcases for all the 6 considered HSMM configurations In thisexperiment we simulate online monitoring by consideringall the testing observations up to the current time thatis x

1x2sdot sdot sdot x

119905 Each time a new data point is acquired

the Viterbi algorithm is used to estimate the current state119904lowast

119905= argmax

1le119894le119873[120575

119905(119894)] as specified in (34)

12 Mathematical Problems in Engineering

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(a) AIC values for continuous data and Gaussian durationdistribution

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(b) AIC values for continuous data and Gamma duration distri-bution

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(c) AIC values for continuous data and Weibull duration distri-bution

2 3 4 5 6 7 8Number of states

0

02

04

06

08

1

12

14

(d) AIC values for discrete data and Gaussian duration distribu-tion

2 3 4 5 6 7 8Number of states

0

02

04

06

08

1

12

14

GaussianGamma

Weibull

(e) AIC values for discrete data andGammaduration distribution

2 3 4 5 6 7 8Number of states

GaussianGamma

Weibull

0

02

04

06

08

1

12

14

(f) AIC values for discrete data andWeibull duration distribution

Figure 3 Akaike Information Criterion (AIC) applied to continuous and discrete observations data AIC is effective for automatic modelselection since its minimum value provides the same number of states and duration model used to generate the data

An example of execution of the condition monitoringexperiment is shown in Figure 4 for both continuous anddiscrete observations respectively In Figure 4(a) the stateduration has beenmodeledwith aGammadistribution whilein Figure 4(b) with a Gaussian distribution In Figures 4(a)and 4(b) the first display represents the true state of the

HSMM and the second display represents the estimated statefrom the Viterbi algorithm while the third display representsthe observed time series

Knowing the true state sequence we calculated theaccuracy defined as the percentage of correctly estimatedstates over the total length of the state sequence for each of

Mathematical Problems in Engineering 13

0 100 200 300 400 500 600 70012345

True state sequence

0 100 200 300 400 500 600 70012345

Viterbi path accuracy 0985

0 100 200 300 400 500 600 7000

204060 Observations

Correct guessWrong guess

(a) State estimation with Viterbi path for continuous data andGamma duration distribution

0 100 200 300 400 500 600 70012345

True state sequence

0 100 200 300 400 500 600 70012345

Viterbi path accuracy 0992

Correct guessWrong guess

0 100 200 300 400 500 600 700246

Observations

(b) State estimation with Viterbi path for discrete data andGaussian duration distribution

Figure 4 Condition monitoring using the Viterbi path HSMMs can be effective to solve condition monitoring problems in time-dependentapplications due to their high accuracy in hidden state recognition

the testing casesThe results are summarized in Table 1(a) forthe continuous observations and in Table 1(b) for the discreteobservations The high percentage of correct classified statesshows that HSMMs can be effectively used to solve conditionmonitoring problems for applications in which the systemshows a strong time and state duration dependency

514 Remaining Useful Lifetime Estimation In this experi-mental phasewe considered the state 119878

5as the failure state and

the trained parameters 120582lowast of Section 512 for each HSMMconfiguration As the online RUL estimation procedure isintended to be used in real time we simulated conditionmonitoring experiment where we progressively consider theobservations x

1x2sdot sdot sdot x

119905 up to time 119905 When a new obser-

vation is acquired after the current state probability 120575119905(119894)

is estimated (Equation (48)) the calculation of the averageupper and lower RUL ((57) (58) and (59)) is performed

Examples of RUL estimation are illustrated in Figure 5In particular Figure 5(a) represents the case of continuousobservations and durationmodeled by aWeibull distributionwhile Figure 5(b) shows the case of discrete observationsand duration modeled by a Gamma distribution From thefigures one can notice that the average as well as the lowerand the upper bound estimations converges to the real RULas the failure time approaches Moreover as expected theuncertainty about the estimation decreases with the timesince predictions performed at an early stage are moreimprecise As a consequence the upper and the lower boundbecome more narrow as the failure state approaches and theestimation becomes more precise until it converges to theactual RUL value with the prediction error tending to zeroat the end of the evaluation

To quantitatively estimate the performance of our meth-odology for the RUL estimation we considered at each time119905 the absolute prediction error (APE) between the real RULand the predicted value defined as

APE (119905) = 1003816100381610038161003816RULreal (119905) minus RUL (119905)1003816100381610038161003816 (63)

where RULreal(119905) is the (known) value of the RUL at time119905 while RUL(119905) is RUL predicted by the model To evaluatethe overall performance of our methodology we consideredthe average absolute prediction error of the RUL estimationdefined as

APE =sum

119879

119905=1APE (119905)119879

(64)

where 119879 is the length of the testing signal APE being aprediction error average values of (64) close to zero corre-spond to good predictive performances

The results for each of the 10 testing cases and the differentHSMM configurations are reported in Tables 2(a) and 2(b)for the continuous and the discrete observation cases respec-tively As it can be noticed the online Remaining UsefulLifetime estimation and in general the online prediction ofthe time to a certain event can be effectively faced withHSMMs which achieve a reliable estimation power with asmall prediction error

Finally we tested our RUL estimationmethodology usingthe state duration estimation of (16) introduced byAzimi et al[30ndash32] The results are shown in Tables 3(a) and 3(b)in which respectively the prediction errors obtained forcontinuous and discrete observations are reported

Comparing Table 2 and Table 3 one can notice that theproposed RUL method outperforms the one of Azimi This

14 Mathematical Problems in Engineering

0 50 100 150 200 250 300 3500

50

100

150

200

250

300

350

400

Time

True RULUpper RUL

Average RULLower RUL

(a) Remaining Useful Lifetime estimation for continuous data andWeibull duration distribution

0 50 100 150 200 250 300 3500

50

100

150

200

250

300

350

400

Time

True RULUpper RUL

Average RULLower RUL

(b) Remaining Useful Lifetime estimation for discrete data andGamma duration distribution

Figure 5 HSMMs effectively solve RUL estimation problemsThe prediction converges to the actual RUL value and its uncertainty decreasesas the real failure time approaches

Table 1 State recognition accuracy

(a) Continuous observations

Test case Duration distributionGaussian Gamma Weibull

1 994 985 9922 997 986 9953 994 992 9974 989 989 9975 982 989 1006 991 988 9977 985 994 9978 992 991 9959 992 986 99710 992 991 995Average 991 989 996

(b) Discrete observations

Test case Duration distributionGaussian Gamma Weibull

1 974 967 9742 972 976 9653 994 958 9664 982 953 9775 991 974 9756 978 977 9787 958 972 9668 977 964 9729 989 972 98510 992 956 969Average 981 967 973

is mainly due to the proposed average state duration of (20)compared to the one of Azimi given by (16)

52 Real Data In this sectionwe apply the proposedHSMM-based approach for RUL estimation to a real case studyusing bearing monitoring data recorded during experimentscarried out on the Pronostia experimental platform andmadeavailable for the IEEE Prognostics and Health Management(PHM) 2012 challenge [49]The data correspond to normallydegraded bearings leading to cases which closely correspondto the industrial reality

The choice of testing the proposed methodology onbearings derives from two facts (i) bearings are the mostcritical components related to failures of rotating machines[50] and (ii) their monotonically increasing degradationpattern justifies the usage of left-right HSMMmodels

521 Data Description Pronostia is an experimental plat-form designed and realized at the Automatic Controland Micro-Mechatronic Systems (AS2M) Department ofFranche-Comte Electronics Mechanics Thermal Process-ing Optics-Science and Technology (FEMTO-ST) Institute(httpwwwfemto-stfr) (Besancon France) with the aimof collecting real data related to accelerated degradation ofbearings Such data are used to validate methods for bearingcondition assessment diagnostic and prognostic [19 51ndash59]

The Pronostia platform allows to perform run-to-failuretests under constant or variable operating conditions Theoperating conditions are determined by two factors thatcan be controlled online (i) the rotation speed and (ii)the radial force load During each experiment tempera-ture and vibration monitoring measurements are gatheredonline through two type of sensors placed in the bearing

Mathematical Problems in Engineering 15

Table 2 Average absolute prediction error (APE) of the RUL esti-mation using the proposed state duration estimator of (20)

(a) APE of the RUL estimation for the continuous observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 51 170 67 140 290 091 45 170 812 76 190 50 61 210 85 66 190 613 77 54 190 29 120 170 160 290 304 90 210 29 75 220 68 60 190 675 73 190 47 22 140 140 39 170 876 65 180 56 51 180 100 140 270 277 47 160 75 48 170 110 12 130 1208 100 220 29 52 180 100 92 220 399 31 92 140 20 160 130 82 210 4910 64 180 56 75 220 69 33 120 130Average 68 170 74 57 190 99 73 200 70

(b) APE of the RUL estimation for the discrete observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 21 110 140 31 88 140 24 120 1302 21 110 130 110 220 33 190 320 713 51 170 76 66 180 51 23 140 1104 59 65 180 52 170 67 42 160 905 32 140 100 83 190 34 120 240 296 120 240 27 62 180 52 41 84 1607 29 150 97 93 210 23 190 310 668 150 270 70 74 180 43 43 170 949 59 180 77 110 230 55 39 160 8810 35 110 140 55 60 160 52 170 71Average 57 150 100 74 170 66 77 190 90

housing a temperature probe and two accelerometers (oneon the vertical and one on the horizontal axis)

The platform is composed of three main parts a rotatingpart a load profile generation part and a measurement partas illustrated in Figure 6

The rotating part is composed of an asynchronous motorwhich develops a power equal to 250W two shafts and agearbox which allows the motor to reach its rated speed of2830 rpm The motorrsquos rotation speed and the direction areset through a human machine interface

The load profiles part issues a radial force on the externalring of the test bearing through a pneumatic jack connectedto a lever arm which indirectly transmits the load througha clamping ring The goal of the applied radial force is toaccelerate the bearingrsquos degradation process

Table 3 Average absolute prediction error (APE) of the RUL esti-mation using the state duration estimator of (16) introduced byAzimi et al [30ndash32]

(a) APE of the RUL estimation for the continuous observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 578 510 668 262 97 527 259 284 6462 502 444 577 213 170 469 290 192 7083 503 447 573 271 87 565 345 139 7344 518 460 604 213 143 459 349 171 7875 594 537 662 290 95 554 334 156 7496 580 517 671 258 83 541 231 258 6657 594 536 669 182 125 477 360 171 7448 634 556 723 194 157 441 348 178 7709 491 435 570 145 171 432 251 267 67010 544 484 628 232 79 527 241 245 674Average 554 493 635 226 121 499 301 206 715

(b) APE of the RUL estimation for the discrete observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 514 410 624 424 318 530 326 264 7362 496 399 604 595 483 708 313 276 6933 502 386 623 465 357 574 324 257 7024 422 315 538 501 405 606 237 361 6035 443 339 558 478 374 591 360 256 7656 522 432 627 552 443 669 272 316 6437 550 439 668 560 457 670 347 232 7448 503 390 620 604 505 710 351 264 7249 555 474 640 480 372 595 318 222 73610 490 382 607 521 412 631 294 289 687Average 500 397 611 518 413 629 314 274 704

Themeasurement part consists of a data acquisition cardconnected to the monitoring sensors which provides theuser with the measured temperature and vibration data Thevibration measurements are provided in snapshots of 01 scollected each 10 seconds at a sampling frequency of 256 kHz(2560 samples per each snapshot) while the temperature hasbeen continuously recorded at a sampling frequency of 10Hz(600 samples collected each minute)

Further details on the Pronostia test rig can be found onthe data presentation paper [49] and on the web page of thedata challenge (httpwwwfemto-stfrenResearch-depart-mentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-challengephp)

16 Mathematical Problems in Engineering

Table 4 Lifetime duration (in seconds) and operating conditions of the bearings tested in the IEEE PHM 2012 Prognostic Challenge dataset[49]

Condition 1 Condition 2 Condition 31800 rpm and 4000N 1650 rpm and 4200N 1500 rpm and 5000N

Bearing Lifetime [s] Bearing Lifetime [s] Bearing Lifetime [s]Bearing1 1 28030 Bearing2 1 9110 Bearing3 1 5150Bearing1 2 8710 Bearing2 2 7970 Bearing3 2 16370Bearing1 3 23750 Bearing2 3 19550 Bearing3 3 4340Bearing1 4 14280 Bearing2 4 7510Bearing1 5 24630 Bearing2 5 23110Bearing1 6 24480 Bearing2 6 7010Bearing1 7 22590 Bearing2 7 2300

Loadmodule

Testedbearing

Dataacquisitionmodule

Rotatingmodule

Figure 6 Global overview of the Pronostia experimental platform[19]

Regarding the data provided for the PHM 2012 challenge3 different operating conditions were considered

(i) first operating conditions speed of 1800 rpm and loadof 4000 Newton

(ii) second operating conditions speed of 1650 rpm andload of 4200 Newton

(iii) third operating conditions speed of 1500 rpm andload of 5000 Newton

Under the above operating conditions a total of 17 accel-erated life tests were realized on bearings of type NSK 6804DD which can operate at a maximum speed of 13000 rpmand a load limit of 4000N The tests were stopped whenthe amplitude of the vibration signal was higher than 20 gthus this moment was defined as the bearing failure time Anexample of bearing before and after the experiment is shownin Figure 7 together with the corresponding vibration signalcollected during the whole test

Table 4 reports how the 17 tested bearings were separatedinto the three operating conditions Moreover the durationof each experiment being the RUL to be predicted for eachbearing is also given We performed two sets of experimentsby considering respectively the bearings relative to thefirst and the second operating condition (ie Bearing1 1Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

As already mentioned the available data correspondto normally degraded bearings meaning that the defects

were not initially induced and that each degraded bearingcontains almost all the types of defects (balls rings and cage)resembling faithfully a common real industrial situationMoreover no assumption about the type of failure to beoccurred is provided with the data and since the variabilityin experiment durations is high (from 1 h to 7 h) performinggood estimates of the RUL is a difficult task [49]

In our experiments we considered as input to our modelthe horizontal channel of the accelerometerWe preprocessedthe raw signals by extracting two time-domain features thatis root mean square (RMS) and kurtosis within windowsof the same length as the given snapshots (119871 = 2560) Let119903119908(119905) be the raw signal of the 119908th window for each 119908 we

estimate RMS as 119909RMS119908

= radic(1119871)sum119871

119905=11199032119908(119905) and kurtosis as

119909KURT119908

= ((1119871)sum119871

119905=1(119903

119908(119905) minus 119903

119908)4)((1119871)sum

119871

119905=1(119903

119908(119905) minus 119903

119908)2)2

where 119903119908is the mean of 119903

119908 An example of feature extraction

for Bearing1 1 is shown in Figure 8To assess the performance of the proposed HSMM after

the model selection procedure we implemented a leave-one-out cross validation scheme by considering separatelyconditions 1 and 2 we performed the online RUL estimationfor each of the 7 bearings using an HSMM trained with theremaining 6 bearing histories Similarly to the simulated casewe considered the average absolute prediction error definedin (64) to quantitatively evaluate our method

522 Bearings RUL Estimation We performed our exper-iments in two steps firstly we applied model selection inorder to determine an optimalmodel structure and secondlywe estimated the RUL of the bearings The full procedure isdetailed in the following

(A) HSMM Structure To determine an appropriate HSMMstructure for effectively modeling the considered data weconsidered several HSMM structures characterized by (i)the duration distribution family (being Gaussian Gamma orWeibull) (ii) an increasing number of states119873 from 2 to 6and (iii) an increasing number of Gaussian mixtures 119872 inthe observation density from 1 to 4 For eachmodel structureobtained by systematically considering all the combinationsof (i) to (iii) we run 120 parameter learnings correspondingto 120 random initializations 1205820 on the data sets (Bearing1 1

Mathematical Problems in Engineering 17

times106

0 1 2 3 4 5 6 7

50

40

30

20

10

0

minus10

minus20

minus30

minus40

minus50

Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]

(b)(a)

0 280300

7

Extracted feature RMS

Time (s)

0 280300

60

Time (s)

Extracted feature kurtosis

Featureextraction

Win

dow

1

Win

dow

2

Win

dow

3

Win

dow

Win

dow

r(t)

x1 x2 x3

middot middot middot

middot middot middot

nminus1

n

xnminus1 xn

Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1

Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density

(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration

Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast

119894were estimated for the 119894th testing bear-

ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878

4as the failure state The same proce-

dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-

ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as

18 Mathematical Problems in Engineering

2 3 4 5 60

005

01

015

02

025

03

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(a) AIC values for Condition 1

2 3 4 5 60

01

02

03

04

05

06

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(b) AIC values for Condition 2

Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density

0 22590

22590

45000

Time (s)

True RULUpper RUL

Average RULLower RUL

(a) RUL estimation for Bearing1 7

True RULUpper RUL

Average RULLower RUL

0 70100

7010

19000

Time (s)

(b) RUL estimation for Bearing2 6

Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings

well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time

Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors

of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2

6 Conclusion and Future Work

In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation

The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required

Mathematical Problems in Engineering 19

Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds

(a) Condition 1

Test Bearings APEavg APElow APEup

Bearing1 1 105716 127230 94146Bearing1 2 43312 38156 38213Bearing1 3 29970 97309 60912Bearing1 4 63363 28766 148719Bearing1 5 19689 74484 104115Bearing1 6 42530 98964 97937Bearing1 7 13880 74943 100881Average 45494 77122 92132

(b) Condition 2

Test Bearings APEavg APElow APEup

Bearing2 1 24759 50065 72875Bearing2 2 16473 44972 82886Bearing2 3 88771 95083 79621Bearing2 4 17698 42486 49825Bearing2 5 86631 104900 107300Bearing2 6 8771 35047 66870Bearing2 7 30125 38664 66519Average 39033 58745 75128

parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology

This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications

As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension

Appendix

In this appendix we give the derivation of the state durationvariable introduced in (20) as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(119883

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A1)

The random variable 119889119905(119894) has been defined in Section 21

as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889

119905(119894) is sampled from an

arbitrary distribution

119889119905(119894) sim 119891 (119889) (A2)

We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as

P (119889119905(119894) = 119889) = P (119904

119905minus119889minus1= 119878

119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905 120582)

(A3)

We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889

119905(119894) of 119889

119905(119894) defined as its

expected value (see Equation (15))

119889119905(119894) = E (119889

119905(119894) | 119904

119905= 119878

119894 x

1x2sdot sdot sdot x

119905) 1 le 119894 le 119873 (A4)

From the definition of expectation we have

119889119905(119894) =

119905

sum

119889=1

119889 sdot P (119889119905(119894) = 119889)

=

119905

sum

119889=1

119889 sdot P (119904119905minus119889minus1

= 119878119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

(A5)

For 119889119905+1

(119894) we have

119889119905+1

(119894) =

119905+1

sum

119889=1

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119886)

(A6)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

(A7)

By noticing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

= (P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1))

sdot (P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1))

minus1

(A8)

20 Mathematical Problems in Engineering

we can replace the probability of the second termof (A7)with

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119887)

(A9)

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

(A10)

In the last factor of (A10) we can omit the information aboutthe current state and observation by observing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

asymp P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894| 119904

119905= 119878

119894 x

1 x

119905)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119888)

(A11)

if the following independencies hold

119904119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

119883119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

(A12)

wherewithperpwe denote independency Equation (A12) holdsforHMMs (evenwithout conditioning on x

1 x

119905) but they

do not hold for HSMMs since the state duration (expressedby 119904

119905minus119889+1 119904

119905minus1) determines the system evolution On

the other hand state duration is partially known by theobservtions x

1 x

119905Thus the approximation is reasonable

as long as the uncertainty on the states is within limitsFrom (A6) (A9) and (A11) we obtain

119889119905+1

(119894) = (119886) +

119905+1

sum

119889=2

119889 sdot (119887) sdot (119888)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

P(119860119861|119862)=P(119860|119861119862)sdotP(119861|119862)

+

119905+1

sum

119889=2

119889 sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

it does not depend on 119889

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

+ P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [

[

P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894119904

119905+1= 119878

119894 x

1 x

119905x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

for the approximation of (A11)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

]

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) +

119905

sum

1198891015840=1

(1198891015840+ 1)

sdot P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

(A13)

Noticing that

119905

sum

1198891015840=1

P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

+ P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) = 1

(A14)

because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows

119889119905+1

(119894) = P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1) sdot (119889

119905(119894) + 1)

(A15)

The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step

In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889

119905+1(119894)

we can consider the following equality

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=P (119904

119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

P (119904119905+1

= 119878119894| x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

120574119905+1

(119894)

(A16)

If we consider the terms involved in the probability at thenumerator of the right-hand side of (A16) we have that

x1 x

119905⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119861

perp x119905+1⏟⏟⏟⏟⏟⏟⏟

119862

| 119904119905= 119878

119894 119904

119905+1= 119878

119894⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119860

(A17)

Mathematical Problems in Engineering 21

If 119861 perp 119862 | 119860 for the Bayes rule we have that

P (119860 | 119862 119861) =P (119862 | 119860119861) sdot P (119860 | 119861)

P (119862 | 119861) (A18)

Hence we can rewrite the numerator of the right-hand sideof (A16) as follows

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

= (P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905)

sdot P(x119905+1

|

x119905+1

perp119904119905|119904119905+1

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞119904119905= 119878

119894 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

= (P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

sdot

120574119905(119894)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (119904

119905= 119878

119894| x

1 x

119905) sdot

119887119894(x119905+1

)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (x

119905+1| 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

(A19)

The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as

P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

= sum

119889119905

119886119894119894(d

119905) sdot P (119889

119905| x

1 x

119905)

asymp 119886119894119894(d

119905)

(A20)

while the denominator of (A19) can be expressed as follows

P (x119905+1

| x1 x

119905) =

P (x1 x

119905 x

119905+1)

P (x1 x

119905)

=sum

119873

119894=1120572119905+1

(119894)

sum119873

119894=1120572119905(119894)

(A21)

By substituting (A20) and (A21) in (A19) we obtain

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

sum119873

119894=1120572119905+1

(119894)

(A22)

and then by combining (A22) and (A16) we obtain

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

120574119905+1

(119894) sum119873

119894=1120572119905+1

(119894)

(A23)

Finally by substituting (A23) in (A15) and considering that

120574119905(119894) =

120572119905(119894)

sum119873

119894=1120572119905(119894)

(A24)

we derive the induction formula for 119889119905+1

(119894) in terms of modelparameters as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(x

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A25)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989

[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004

[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996

[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002

[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005

[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008

[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309

[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en

[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003

[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002

[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven

22 Mathematical Problems in Engineering

methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012

[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008

[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989

[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031

[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010

[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010

[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011

[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012

[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012

[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980

[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk

[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007

[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005

[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003

[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997

[26] S-Z Yu ldquoHidden semi-Markov modelsrdquo Artificial Intelligencevol 174 no 2 pp 215ndash243 2010

[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003

[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006

[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014

[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003

[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008

[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005

[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999

[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007

[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011

[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001

[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002

[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007

[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997

[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008

[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005

[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997

Mathematical Problems in Engineering 23

[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002

[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986

[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993

[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006

[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973

[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977

[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012

[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985

[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015

[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013

[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013

[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013

[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013

[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012

[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012

[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model

and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013

[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Hidden Semi-Markov Models for Predictive Maintenance

Mathematical Problems in Engineering 7

223 The Training Algorithm The training algorithm con-sists of estimating themodel parameters from the observationdata As discussed in Section 21 a parametric HSMMis defined by 120582 = (A0

Θ 119862 120583 119880 120587) if the observationsare continuous or 120582 = (A0

Θ 119861 120587) if the observationsare discrete Given a generic observation sequence x =

x1x2sdot sdot sdot x

119879 referred to as training set in the following the

training procedure consists of finding the model parameterset120582lowast which locallymaximizes themodel likelihoodP(x | 120582)

We use the modified Baum-Welch algorithm of Azimiet al [30ndash32] However in our implementation we do notmake assumption on the density function used to model thestate duration and we consider both continuous and discreteobservations

Being a variant of the more general Expectation-Max-imization (EM) algorithm Baum-Welch is an iterative proce-dure which consists of two steps (i) the expectation step inwhich the forward and backward variables are calculated andthe model likelihood is estimated and (ii) the maximizationstep in which the model parameters are updated and usedin the next iteration This process usually starts from arandom guess of the model parameters 1205820 and it is iterateduntil the likelihood function does not improve between twoconsecutive iterations

Similarly to HMMs the reestimation formulas arederived by firstly introducing the variable 120585

119905(119894 119895) which

represents the probability of being in state 119878119894at time 119905 and

in state 119878119895at time 119905 + 1 given the model and the observation

sequence as

120585119905(119894 119895) = P (119904

119905= 119878

119894 119904

119905+1= 119878

119895| x 120582) (36)

However in the HSMM case the variable 120585119905(119894 119895) considers

the duration estimation performed in the forward algorithm(see Equation (24)) Formulated in terms of the forward andbackward variables it is given by

120585119905(119894 119895) = P (119904

119905= 119878

119894 119904

119905+1= 119878

119895| x 120582)

=P (119904

119905= 119878

119894 119904

119905+1= 119878

119895 x | 120582)

P (x | 120582)

=120572119905(119894) 119886

119894119895(d

119905) 119887

119895(x

119905+1) 120573

119905+1(119895)

P (x | 120582)

=120572119905(119894) 119886

119894119895(d

119905) 119887

119895(x

119905+1) 120573

119905+1(119895)

sum119873

119894=1sum

119873

119895=1120572119905(119894) 119886

119894119895(d

119905) 119887

119895(x

119905+1) 120573

119905+1(119895)

(37)

From 120585119905(119894 119895) we can derive the quantity 120574

119905(119894) (already

defined in (17)) representing the probability of being in state119878119894at time 119905 given the observation sequence and the model

parameters

120574119905(119894) =

119873

sum

119895=1

120585119905(119894 119895) (38)

Finally the the reestimation formulas for the parameters120587 and A0 are given by

120587119894= 120574

1(119894) (39)

1198860

119894119895=

(sum119879minus1

119905=1120585119905(119894 119895)) ⊙ 119866

sum119873

119895=1(sum

119879minus1

119905=1120585119905(119894 119895)) ⊙ 119866

(40)

where119866 = [119892119894119895] is a squarematrix of dimensions119873times119873where

119892119894119895= 0 for 119894 = 119895 and 119892

119894119895= 1 for 119894 = 119895 ⊙ represents the element

by element product between two matrices sum119879minus1

119905=1120574119905(119894) is the

expected number of transitions from state 119878119894 andsum119879minus1

119905=1120585119905(119894 119895)

is the expected number of transitions from state 119878119894to state 119878

119895

Equation (39) represents the expected number of timesthat the model starts in state 119878

119894 while (40) represents the

expected number of transitions from state 119878119894to state 119878

119895with

119894 = 119895 over the total expected number of transitions from state119878119894to any other state different from 119878

119894

For the matrix A0 being normalized the stochasticconstraints are satisfied at each iteration that is sum119873

119895=11198860

119894119895= 1

for each 1 le 119894 le 119873 while the estimation of the priorprobability 120587

119894inherently sums up to 1 at each iteration since

it represents the expected frequency in state 119878119894at time 119905 = 1

for each 1 le 119894 le 119873With respect to the reestimation of the state duration

parameters Θ firstly we estimate the mean 120583119894119889

and thevariance 1205902

119894119889of the 119894th state duration for each 1 le 119894 le 119873

from the forward and backward variables and the estimationof the state duration variable

120583119894119889

=sum

119879minus1

119905=1120572119905(119894) (sum

119873

119895=1119895 =119894119886119894119895(119889

119905(119894)) 119887

119895(x

119905+1) 120573

119905+1(119895)) 119889

119905(119894)

sum119879minus1

119905=1120572119905(119894) (sum

119873

119895=1119895 =119894119886119894119895(119889

119905(119894)) 119887

119895(x

119905+1) 120573

119905+1(119895))

(41)

1205902

119894119889= (

119879minus1

sum

119905=1

120572119905(119894)(

119873

sum

119895=1119895 =119894

119886119894119895(119889

119905(119894)) 119887

119895(x

119905+1) 120573

119905+1(119895))

sdot (119889119905(119894) minus 120583

119894119889)2

)

sdot (

119879minus1

sum

119905=1

120572119905(119894)(

119873

sum

119895=1119895 =119894

119886119894119895(119889

119905(119894)) 119887

119895(x

119905+1) 120573

119905+1(119895)))

minus1

(42)

where (41) can be interpreted as the probability of transitionfrom state 119878

119894to 119878

119895with 119894 = 119895 at time 119905weighted by the duration

of state 119878119894at 119905 giving the desired expected value while in (42)

the same quantity is weighted by the squared distance of theduration at time 119905 from its mean giving the estimation of thevariance

Then the parameters of the desired duration distributioncan be estimated from 120583

119894119889and 1205902

119894119889 For example if a Gamma

distribution with shape parameter ] and scale parameter 120578 ischosen to model the state duration the parameters ]

119894and 120578

119894

for each 1 le 119894 le 119873 can be calculated as ]119894= 120583

2

119894119889120590

2

119894119889and

120578119894= 120590

2

119894119889120583

119894119889

8 Mathematical Problems in Engineering

Concerning the observation parameters once the modi-fied forward and backward variables accounting for the stateduration are defined as in (22) and (28) the reestimationformulas are the same as for Hidden Markov Models [13]

In particular for continuous observations the parametersof the Gaussiansrsquo mixture defined in (9) are reestimated byfirstly defining the probability of being in state 119878

119895at time 119905

with the probability of the observation vector x119905evaluated by

the 119896th mixture component as

120574119905(119895 119896) = [

[

120572119905(119895) 120573

119905(119895)

sum119873

119895=1120572119905(119895) 120573

119905(119895)

]

]

sdot [

[

119888119895119896N (x

119905120583

119895119896U

119895119896)

sum119872

119898=1119888119895119898N (x

119905120583

119895119898U

119895119898)

]

]

(43)

By using the former quantity the parameters 119888119895119896 120583

119895119896 andU

119895119896

are reestimated through the following formulas

119888119895119896=

sum119879

119905=1120574119905(119895 119896)

sum119879

119905=1sum

119872

119898=1120574119905(119895 119898)

120583119895119896=sum

119879

119905=1120574119905(119895 119896) sdot x

119905

sum119879

119905=1120574119905(119895 119896)

U119895119896=sum

119879

119905=1120574119905(119895 119896) sdot (x

119905minus 120583

119895119896) (x

119905minus 120583

119895119896)119879

sum119879

119905=1120574119905(119895 119896)

(44)

where superscript 119879 denotes vector transposeFor discrete observations the reestimation formula for

the observation matrix 119887119895(119897) is

119887119895(119897) =

sum119879

119905=1with119909119905=119883119897

120574119905(119895)

sum119879

119905=1120574119905(119895)

(45)

where the quantity 120574119905(119895) which takes into account the dura-

tion dependent forward variable 120572119905(119895) is calculated through

(17)The reader is referred to Rabinerrsquos work [13] for the inter-

pretation on the observation parameters reestimation formu-las

3 AIC-Based Model Selection

In the framework of the proposed parametric HSMMs themodel selection procedure aims to select the optimal numberof hidden states119873 the right duration distribution family andin the case of mixture observation modeling the number ofGaussian mixtures 119872 to be used In this work we make useof the Akaike Information Criterion (AIC) Indeed it hasbeen seen that in case of complex models and in presence ofa limited number of training observations AIC represents asatisfactorymethodology formodel selection outperformingother approaches like Bayesian Information Criterion

In general information criteria are represented as a two-term structure They account for a compromise between

a measure of model fitness which is based on the likelihoodof the model and a penalty term which takes into accountthe model complexity Usually the model complexity ismeasured in terms of the number of parameters that have tobe estimated and in terms of the number of observations

The Akaike Information Criterion is an estimate of theasymptotic value of the expected distance between theunknown true likelihood function of the data and the fittedlikelihood function of the model In particular the AIC canbe expressed as

AIC =minus log 119871 () + 119901

119879 (46)

where 119871() is the likelihood of the model with the estimatedparameters as defined in (25) 119901 is the number of modelparameters and119879 is the length of the observed sequenceThebest model is the one minimizing equation (46)

Concerning 119901 the number of parameters to be estimatedfor a parametric HSMM with119873 states is 119901 = 119901

ℎ+ 119901

119900 where

119901ℎare the parameters of the hidden states layer while 119901

119900are

those of the observation layerIn particular 119901

ℎ= (119873 minus 1) + (119873 minus 1) sdot 119873 + 119911 sdot 119873 where

(i) 119873 minus 1 accounts for the prior probabilities 120587(ii) (119873 minus 1) sdot 119873 accounts for the nonrecurrent transition

matrix A0(iii) 119911sdot119873 accounts for the duration probability being 119911 the

number of parameters 120579 of the duration distribution

Concerning 119901119900 a distinction must be made between

discrete and continuous observations

(i) in the case of discrete observations with 119871 possibleobservable values 119901

119900= (119871 minus 1) sdot 119873 which accounts

for the elements of the observation matrix 119861(ii) if the observations are continuous and a multivariate

mixture of 119872 Gaussians with 119874 variates is used asobservationmodel 119901

119900= [119874 sdot119873sdot119872]+[119874 sdot119874 sdot119873sdot119872]+

[(119872minus 1) sdot 119873] where each term accounts respectivelyfor the mean vector 120583 the covariance matrix 119880 andthe mixture coefficients 119862

4 Remaining Useful Lifetime Estimation

One of the most important advantages of the time modelingof HSMMs is the possibility to effectively face the predictionproblem The knowledge of the state duration distributionsallows the estimation of the remaining time in a certain stateand in general the prediction of the expected time119863 beforeentering in a determinate state

As already mentioned an interesting application of theprediction problem is the Remaining Useful Lifetime (RUL)estimation of industrial equipments Indeed if each stateof an HSMM is mapped to a different condition of anindustrial machine and if the state 119878

119896that represents the

failure condition is identified at each moment the RUL canbe defined as the expected time 119863 to reach the failure state

Mathematical Problems in Engineering 9

119878119896 If we assume that the time to failure is a random variable

119863 following a determinate probability density we define theRUL at the current time 119905 as

RUL119905= 119863 = E (119863) 119904

119905+= 119878

119896 119904

119905+minus1= 119878

119894

1 le 119894 119896 le 119873 119894 = 119896

(47)

where E denotes the expected valueHaving fixed the failure state the estimation of the RUL

is performed in two steps every time a new observation isacquired (online)

(1) estimation of the current state(2) projection of the future state transitions until the

failure state is reached and estimation of the expectedsojourn time

The estimation of the current state is performed via theViterbi path that is the variable 120575

119905= [120575

119905(119894)]

1le119894le119873defined in

(29) To correctly model the uncertainty of the current stateestimation we use the normalized variable 120575

119905(119894) obtained as

120575119905(119894) = max

11990411199042119904119905minus1

P (119904119905= 119878

119894| 119904

11199042sdot sdot sdot 119904

119905minus1 x

1x2sdot sdot sdot x

119905 120582)

=120575119905(119894)

sum119873

119895=1120575119905(119895)

1 le 119894 le 119873

(48)

that is an estimate of the probability of being in state 119878119894at

time 119905Together with the normalized variable 120575

119905(119894) the maxi-

mum a posteriori estimate of the current state 119904lowast

119905is taken

into account according to (34) If 119904lowast119905coincides with the failure

state the desired event is detected by the model and the timeto this event is obviously zero Otherwise an estimation ofthe average remaining time in the current state 119889avg(119904

lowast

119905) is

calculated as

119889avg (119904lowast

119905) =

119873

sum

119894=1

(120583119889119894

minus 119889119905(119894)) ⊙ 120575

119905(119894) (49)

where with 120583119889119894

we denote the expected value of the durationvariable in state 119878

119894according to the duration distribution

specified by the parameters 120579119894 Equation (49) thus estimates

the remaining time in the current state by subtracting theestimated states duration 119889

119905(119894) at time 119905 from the expected

sojourn time of state 119878119894 and weighting the result using the

uncertainty about the current state 120575119905(119894) and finally by

summing up all the contributions from each stateIn addition to the average remaining time a lower and an

upper bound value can be calculated based on the standarddeviation 120590

119889119894

of the duration distribution for state 119878119894

119889low (119904lowast

119905) =

119873

sum

119894=1

(120583119889119894

minus 120590119889119894

minus 119889119905(119894)) ⊙ 120575

119905(119894) (50)

119889up (119904lowast

119905) =

119873

sum

119894=1

(120583119889119894

+ 120590119889119894

minus 119889119905(119894)) ⊙ 120575

119905(119894) (51)

Once the remaining time in the current state is estimatedthe probability of the next state is calculated by multiplyingthe transpose of the nonrecurrent transition matrix by thecurrent state probability estimation as follows

120575next = [120575119905+119889

(119894)]1le119894le119873

= (A0)119879

sdot 120575119905 (52)

while themaximumaposteriori estimate of the next state 119904lowastnextis calculated as

119904lowast

next = 119904lowast

119905+119889= argmax

1le119894le119873

120575119905+119889

(119894) (53)

Again if 119904lowast119905+119889

coincides with the failure state the failurewill happen after the remaining time at the current state isover and the average estimation of the failure time is 119863avg =

119889avg(119904lowast

119905) calculated at the previous step with the bound values

119863low = 119889low(119904lowast

119905) and119863up = 119889up(119904

lowast

119905) Otherwise the estimation

of the sojourn time of the next state is calculated as follows

119889avg (119904lowast

119905+119889) =

119873

sum

119894=1

120583119889119894

⊙ 120575119905+119889

(119894) (54)

119889low (119904lowast

119905+119889) =

119873

sum

119894=1

(120583119889119894

minus 120590119889119894

) ⊙ 120575119905+119889

(119894) (55)

119889up (119904lowast

119905+119889) =

119873

sum

119894=1

(120583119889119894

+ 120590119889119894

) ⊙ 120575119905+119889

(119894) (56)

This procedure is repeated until the failure state isencountered in the prediction of the next state The calcula-tion of the RUL is then simply obtained by summing all theestimated remaining time in each intermediate state beforeencountering the failure state

119863avg = sum119889avg (57)

119863low = sum119889low (58)

119863up = sum119889up (59)

Finally Algorithm 1 details the above described RUL esti-mation procedure

5 Experimental Results

To demonstrate the effectiveness of the proposed HSMMmodels we make use of a series of experiments performedboth on simulated and real data

The simulated data were generated by considering a left-right HSMM and adapting the parameters of the artificialexample reported in the work of Lee et al [15] The real casedata are monitoring data related to the entire operationallife of bearings made available for the IEEE PHM 2012 datachallenge (httpwwwfemto-stfrenResearch-departmentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-chal-lengephp)

10 Mathematical Problems in Engineering

(1) function RulEstimation(x119905 119878

119896) ⊳ x

119905 The last observation acquired

(2) ⊳ 119878119896 The failure state

(3) Initialization(4) 119863avg larr 0

(5) 119863low larr 0

(6) 119863up larr 0

(7) Current state estimation(8) Calculate 120575

119905⊳ Using (48)

(9) Calculate 119904lowast119905

⊳ Using (34)(10) Calculate d

119905⊳ Using (20)

(11) 119878 larr 119904lowast

119905

(12) Loop(13) while 119878 = 119878

119896do

(14) Calculate 119889avg ⊳ Using (49) or (54)(15) Calculate 119889low ⊳ Using (50) or (55)(16) Calculate 119889up ⊳ Using (51) or (56)(17) 119863avg larr 119863avg + 119889avg

(18) 119863low larr 119863low + 119889low(19) 119863up larr 119863up + 119889up(20) Calculate 120575next ⊳ Using (52)(21) Calculate 119904lowastnext ⊳ Using (53)(22) 119878 larr 119904

lowast

nextend

(23) return 119863avg119863low119863up

Algorithm 1 Remaining Useful Lifetime estimation (Pseudo-Code)

51 Simulated Experiment Data have been generated withthe idea of simulating the behavior of an industrial machinethat during its functioning experiences several degradationmodalities until a failure state is reached at the end of itslifetimeThe generated data are used to test the performancesof our methodology for (i) automatic model selection (ii)online condition monitoring and (iii) Remaining UsefulLifetime estimation considering both continuous and dis-crete observations

511 Data Generation The industrial machine subject ofthese experiments has been modeled as a left-right paramet-ric HSMM with 119873 = 5 states having state 119878

5as absorbing

(failure) stateThe choice of a left-right setting has beenmadefor simplicity reasons since the primary goal of this work is todemonstrate that the proposed model specification coupledwith the Akaike Information Criterion is effective to solveautomatic model selection online condition monitoringand prediction problems At this purpose we divided theexperiments in two cases according to the nature of theobservation being continuous or discrete

For each of the continuous and the discrete cases threedata sets have been generated by considering the followingduration models Gaussian Gamma and Weibull densitiesFor each of the three data sets 30 series of data are usedas training set and 10 series as testing set Each time seriescontains 119879 = 650 observations The parameters used togenerate the data are taken from the work of Lee et al [15]

and are adapted to obtain an equivalent left-right parametricHSMM as follows

120587 =

[[[[[

[

1

0

0

0

0

]]]]]

]

A0=

[[[[[

[

0 1 0 0 0

0 0 1 0 0

0 0 0 1 0

0 0 0 0 1

0 0 0 0 1

]]]]]

]

ΘN = 1205791= [100 20] 120579

2= [90 15] 120579

3= [100 20]

1205794= [80 25] 120579

5= [200 1]

ΘG = 1205791= [500 02] 120579

2= [540 01667]

1205793= [500 02] 120579

4= [256 03125]

1205795= [800 0005]

ΘW = 1205791= [102 28] 120579

2= [92 29] 120579

3= [102 28]

1205794= [82 20] 120579

5= [200 256]

(60)

whereΘNΘG andΘW are the different distribution param-eters used for the Gaussian Gamma and Weibull dura-tion models respectively More precisely they represent thevalues of the mean 120583

119889and the the variance 120590

2

119889of the

Gaussian distribution the shape ]119889and the scale 120578

119889of the

Gamma distribution and the scale 119886119889and the shape 119887

119889of

Mathematical Problems in Engineering 11

0 100 200 300 400 500 600

12345

Hidden states sequence

0 100 200 300 400 500 6000

100

200

300 State duration

0 100 200 300 400 500 6000

20

40

60Observed signal

(a) Example of simulated data for the continuous case

0 100 200 300 400 500 600

12345

Hidden states sequence

0 100 200 300 400 500 6000

100

200

300 State duration

0 100 200 300 400 500 600

132

54

76

Observed symbols

(b) Example of simulated data for the discrete case

Figure 2 The data generated with the parameter described in Section 511 both for the continuous and the discrete case

theWeibull distribution It must be noticed that as explainedin Section 21 for state 119878

5 being the absorbing state the

duration parameters 1205795have no influence on the data since

once the state 1198785is reached the system will remain there

foreverConcerning the continuous observation modeling a

bivariate Gaussian distribution has been used with the fol-lowing parameters [15]

1205831= [

20

20] 120583

2= [

20

35] 120583

3= [

35

35]

1205835= [

28

28]

1198801= [

20 0

0 20] 119880

2= [

15 0

0 15] 119880

3= [

15 minus2

minus2 15]

1198804= [

5 0

0 5] 119880

5= [

10 3

3 10]

(61)

while for the discrete case 119871 = 7 distinct observation sym-bols have been taken into consideration with the followingobservation probability distribution

119861 =

[[[[[

[

08 02 0 0 0 0 0

01 08 01 0 0 0 0

0 01 08 01 0 0 0

0 0 01 07 01 01 0

0 0 0 02 06 01 01

]]]]]

]

(62)

An example of simulated data both for the continuous and thediscrete cases is shown in Figure 2 where aGaussian durationmodel has been used

512 Training and Model Selection The goal of this experi-mental phase is to test the effectiveness of the AIC in solvingthe automatic model selection For this purpose the trainingsets of the 6 data sets (continuousdiscrte observation withGaussian Gamma and Weibull duration models) have beentaken individually and for each one of them a series of

learning procedure has been run each one with a variableHSMM structure In particular we took into account all thecombinations of the duration distribution families (GaussianGamma and Weibull) an increasing number of states 119873from 2 to 8 and for the continuous observation cases anincreasing number of Gaussian mixtures119872 in the observa-tion distribution from 1 to 4

As accurate parameter initialization is crucial for obtain-ing a good model fitting [14] a series of 40 learning proce-dures corresponding to 40 random initializations of the initialparameters 1205820 have been executed for each of the consideredHSMM structures For each model structure the AIC valueas defined in (46) has been evaluatedThe final trained set ofparameters 120582lowast corresponding to the minimumAIC value hasbeen retained resulting in 7 HSMMs with a number of statesfrom 2 to 8

The obtained results are shown in Figure 3 for boththe continuous and discrete observation data As it can benoticed for all the 6 test cases of Figure 3 the AIC valuesdo not improve much for a number of states higher than 5meaning that adding more states does not add considerableinformation to the HSMM modeling power Hence 5 statescan be considered as an optimal number of states Moreoveras shown in the zoomed sections of Figure 3 for the HSMMswith 5 states the minimum AIC values are obtained for theduration distributions corresponding to the ones used togenerate the data As a consequence AIC can be considered asan effective approach to performmodel selection for HSMMas well as selecting the appropriate parametric distributionfamily for the state duration modeling

513 Condition Monitoring The optimal parameters 120582lowast

obtained in the previous phase have been used to performthe online conditionmonitoring experiment on the 10 testingcases for all the 6 considered HSMM configurations In thisexperiment we simulate online monitoring by consideringall the testing observations up to the current time thatis x

1x2sdot sdot sdot x

119905 Each time a new data point is acquired

the Viterbi algorithm is used to estimate the current state119904lowast

119905= argmax

1le119894le119873[120575

119905(119894)] as specified in (34)

12 Mathematical Problems in Engineering

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(a) AIC values for continuous data and Gaussian durationdistribution

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(b) AIC values for continuous data and Gamma duration distri-bution

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(c) AIC values for continuous data and Weibull duration distri-bution

2 3 4 5 6 7 8Number of states

0

02

04

06

08

1

12

14

(d) AIC values for discrete data and Gaussian duration distribu-tion

2 3 4 5 6 7 8Number of states

0

02

04

06

08

1

12

14

GaussianGamma

Weibull

(e) AIC values for discrete data andGammaduration distribution

2 3 4 5 6 7 8Number of states

GaussianGamma

Weibull

0

02

04

06

08

1

12

14

(f) AIC values for discrete data andWeibull duration distribution

Figure 3 Akaike Information Criterion (AIC) applied to continuous and discrete observations data AIC is effective for automatic modelselection since its minimum value provides the same number of states and duration model used to generate the data

An example of execution of the condition monitoringexperiment is shown in Figure 4 for both continuous anddiscrete observations respectively In Figure 4(a) the stateduration has beenmodeledwith aGammadistribution whilein Figure 4(b) with a Gaussian distribution In Figures 4(a)and 4(b) the first display represents the true state of the

HSMM and the second display represents the estimated statefrom the Viterbi algorithm while the third display representsthe observed time series

Knowing the true state sequence we calculated theaccuracy defined as the percentage of correctly estimatedstates over the total length of the state sequence for each of

Mathematical Problems in Engineering 13

0 100 200 300 400 500 600 70012345

True state sequence

0 100 200 300 400 500 600 70012345

Viterbi path accuracy 0985

0 100 200 300 400 500 600 7000

204060 Observations

Correct guessWrong guess

(a) State estimation with Viterbi path for continuous data andGamma duration distribution

0 100 200 300 400 500 600 70012345

True state sequence

0 100 200 300 400 500 600 70012345

Viterbi path accuracy 0992

Correct guessWrong guess

0 100 200 300 400 500 600 700246

Observations

(b) State estimation with Viterbi path for discrete data andGaussian duration distribution

Figure 4 Condition monitoring using the Viterbi path HSMMs can be effective to solve condition monitoring problems in time-dependentapplications due to their high accuracy in hidden state recognition

the testing casesThe results are summarized in Table 1(a) forthe continuous observations and in Table 1(b) for the discreteobservations The high percentage of correct classified statesshows that HSMMs can be effectively used to solve conditionmonitoring problems for applications in which the systemshows a strong time and state duration dependency

514 Remaining Useful Lifetime Estimation In this experi-mental phasewe considered the state 119878

5as the failure state and

the trained parameters 120582lowast of Section 512 for each HSMMconfiguration As the online RUL estimation procedure isintended to be used in real time we simulated conditionmonitoring experiment where we progressively consider theobservations x

1x2sdot sdot sdot x

119905 up to time 119905 When a new obser-

vation is acquired after the current state probability 120575119905(119894)

is estimated (Equation (48)) the calculation of the averageupper and lower RUL ((57) (58) and (59)) is performed

Examples of RUL estimation are illustrated in Figure 5In particular Figure 5(a) represents the case of continuousobservations and durationmodeled by aWeibull distributionwhile Figure 5(b) shows the case of discrete observationsand duration modeled by a Gamma distribution From thefigures one can notice that the average as well as the lowerand the upper bound estimations converges to the real RULas the failure time approaches Moreover as expected theuncertainty about the estimation decreases with the timesince predictions performed at an early stage are moreimprecise As a consequence the upper and the lower boundbecome more narrow as the failure state approaches and theestimation becomes more precise until it converges to theactual RUL value with the prediction error tending to zeroat the end of the evaluation

To quantitatively estimate the performance of our meth-odology for the RUL estimation we considered at each time119905 the absolute prediction error (APE) between the real RULand the predicted value defined as

APE (119905) = 1003816100381610038161003816RULreal (119905) minus RUL (119905)1003816100381610038161003816 (63)

where RULreal(119905) is the (known) value of the RUL at time119905 while RUL(119905) is RUL predicted by the model To evaluatethe overall performance of our methodology we consideredthe average absolute prediction error of the RUL estimationdefined as

APE =sum

119879

119905=1APE (119905)119879

(64)

where 119879 is the length of the testing signal APE being aprediction error average values of (64) close to zero corre-spond to good predictive performances

The results for each of the 10 testing cases and the differentHSMM configurations are reported in Tables 2(a) and 2(b)for the continuous and the discrete observation cases respec-tively As it can be noticed the online Remaining UsefulLifetime estimation and in general the online prediction ofthe time to a certain event can be effectively faced withHSMMs which achieve a reliable estimation power with asmall prediction error

Finally we tested our RUL estimationmethodology usingthe state duration estimation of (16) introduced byAzimi et al[30ndash32] The results are shown in Tables 3(a) and 3(b)in which respectively the prediction errors obtained forcontinuous and discrete observations are reported

Comparing Table 2 and Table 3 one can notice that theproposed RUL method outperforms the one of Azimi This

14 Mathematical Problems in Engineering

0 50 100 150 200 250 300 3500

50

100

150

200

250

300

350

400

Time

True RULUpper RUL

Average RULLower RUL

(a) Remaining Useful Lifetime estimation for continuous data andWeibull duration distribution

0 50 100 150 200 250 300 3500

50

100

150

200

250

300

350

400

Time

True RULUpper RUL

Average RULLower RUL

(b) Remaining Useful Lifetime estimation for discrete data andGamma duration distribution

Figure 5 HSMMs effectively solve RUL estimation problemsThe prediction converges to the actual RUL value and its uncertainty decreasesas the real failure time approaches

Table 1 State recognition accuracy

(a) Continuous observations

Test case Duration distributionGaussian Gamma Weibull

1 994 985 9922 997 986 9953 994 992 9974 989 989 9975 982 989 1006 991 988 9977 985 994 9978 992 991 9959 992 986 99710 992 991 995Average 991 989 996

(b) Discrete observations

Test case Duration distributionGaussian Gamma Weibull

1 974 967 9742 972 976 9653 994 958 9664 982 953 9775 991 974 9756 978 977 9787 958 972 9668 977 964 9729 989 972 98510 992 956 969Average 981 967 973

is mainly due to the proposed average state duration of (20)compared to the one of Azimi given by (16)

52 Real Data In this sectionwe apply the proposedHSMM-based approach for RUL estimation to a real case studyusing bearing monitoring data recorded during experimentscarried out on the Pronostia experimental platform andmadeavailable for the IEEE Prognostics and Health Management(PHM) 2012 challenge [49]The data correspond to normallydegraded bearings leading to cases which closely correspondto the industrial reality

The choice of testing the proposed methodology onbearings derives from two facts (i) bearings are the mostcritical components related to failures of rotating machines[50] and (ii) their monotonically increasing degradationpattern justifies the usage of left-right HSMMmodels

521 Data Description Pronostia is an experimental plat-form designed and realized at the Automatic Controland Micro-Mechatronic Systems (AS2M) Department ofFranche-Comte Electronics Mechanics Thermal Process-ing Optics-Science and Technology (FEMTO-ST) Institute(httpwwwfemto-stfr) (Besancon France) with the aimof collecting real data related to accelerated degradation ofbearings Such data are used to validate methods for bearingcondition assessment diagnostic and prognostic [19 51ndash59]

The Pronostia platform allows to perform run-to-failuretests under constant or variable operating conditions Theoperating conditions are determined by two factors thatcan be controlled online (i) the rotation speed and (ii)the radial force load During each experiment tempera-ture and vibration monitoring measurements are gatheredonline through two type of sensors placed in the bearing

Mathematical Problems in Engineering 15

Table 2 Average absolute prediction error (APE) of the RUL esti-mation using the proposed state duration estimator of (20)

(a) APE of the RUL estimation for the continuous observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 51 170 67 140 290 091 45 170 812 76 190 50 61 210 85 66 190 613 77 54 190 29 120 170 160 290 304 90 210 29 75 220 68 60 190 675 73 190 47 22 140 140 39 170 876 65 180 56 51 180 100 140 270 277 47 160 75 48 170 110 12 130 1208 100 220 29 52 180 100 92 220 399 31 92 140 20 160 130 82 210 4910 64 180 56 75 220 69 33 120 130Average 68 170 74 57 190 99 73 200 70

(b) APE of the RUL estimation for the discrete observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 21 110 140 31 88 140 24 120 1302 21 110 130 110 220 33 190 320 713 51 170 76 66 180 51 23 140 1104 59 65 180 52 170 67 42 160 905 32 140 100 83 190 34 120 240 296 120 240 27 62 180 52 41 84 1607 29 150 97 93 210 23 190 310 668 150 270 70 74 180 43 43 170 949 59 180 77 110 230 55 39 160 8810 35 110 140 55 60 160 52 170 71Average 57 150 100 74 170 66 77 190 90

housing a temperature probe and two accelerometers (oneon the vertical and one on the horizontal axis)

The platform is composed of three main parts a rotatingpart a load profile generation part and a measurement partas illustrated in Figure 6

The rotating part is composed of an asynchronous motorwhich develops a power equal to 250W two shafts and agearbox which allows the motor to reach its rated speed of2830 rpm The motorrsquos rotation speed and the direction areset through a human machine interface

The load profiles part issues a radial force on the externalring of the test bearing through a pneumatic jack connectedto a lever arm which indirectly transmits the load througha clamping ring The goal of the applied radial force is toaccelerate the bearingrsquos degradation process

Table 3 Average absolute prediction error (APE) of the RUL esti-mation using the state duration estimator of (16) introduced byAzimi et al [30ndash32]

(a) APE of the RUL estimation for the continuous observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 578 510 668 262 97 527 259 284 6462 502 444 577 213 170 469 290 192 7083 503 447 573 271 87 565 345 139 7344 518 460 604 213 143 459 349 171 7875 594 537 662 290 95 554 334 156 7496 580 517 671 258 83 541 231 258 6657 594 536 669 182 125 477 360 171 7448 634 556 723 194 157 441 348 178 7709 491 435 570 145 171 432 251 267 67010 544 484 628 232 79 527 241 245 674Average 554 493 635 226 121 499 301 206 715

(b) APE of the RUL estimation for the discrete observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 514 410 624 424 318 530 326 264 7362 496 399 604 595 483 708 313 276 6933 502 386 623 465 357 574 324 257 7024 422 315 538 501 405 606 237 361 6035 443 339 558 478 374 591 360 256 7656 522 432 627 552 443 669 272 316 6437 550 439 668 560 457 670 347 232 7448 503 390 620 604 505 710 351 264 7249 555 474 640 480 372 595 318 222 73610 490 382 607 521 412 631 294 289 687Average 500 397 611 518 413 629 314 274 704

Themeasurement part consists of a data acquisition cardconnected to the monitoring sensors which provides theuser with the measured temperature and vibration data Thevibration measurements are provided in snapshots of 01 scollected each 10 seconds at a sampling frequency of 256 kHz(2560 samples per each snapshot) while the temperature hasbeen continuously recorded at a sampling frequency of 10Hz(600 samples collected each minute)

Further details on the Pronostia test rig can be found onthe data presentation paper [49] and on the web page of thedata challenge (httpwwwfemto-stfrenResearch-depart-mentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-challengephp)

16 Mathematical Problems in Engineering

Table 4 Lifetime duration (in seconds) and operating conditions of the bearings tested in the IEEE PHM 2012 Prognostic Challenge dataset[49]

Condition 1 Condition 2 Condition 31800 rpm and 4000N 1650 rpm and 4200N 1500 rpm and 5000N

Bearing Lifetime [s] Bearing Lifetime [s] Bearing Lifetime [s]Bearing1 1 28030 Bearing2 1 9110 Bearing3 1 5150Bearing1 2 8710 Bearing2 2 7970 Bearing3 2 16370Bearing1 3 23750 Bearing2 3 19550 Bearing3 3 4340Bearing1 4 14280 Bearing2 4 7510Bearing1 5 24630 Bearing2 5 23110Bearing1 6 24480 Bearing2 6 7010Bearing1 7 22590 Bearing2 7 2300

Loadmodule

Testedbearing

Dataacquisitionmodule

Rotatingmodule

Figure 6 Global overview of the Pronostia experimental platform[19]

Regarding the data provided for the PHM 2012 challenge3 different operating conditions were considered

(i) first operating conditions speed of 1800 rpm and loadof 4000 Newton

(ii) second operating conditions speed of 1650 rpm andload of 4200 Newton

(iii) third operating conditions speed of 1500 rpm andload of 5000 Newton

Under the above operating conditions a total of 17 accel-erated life tests were realized on bearings of type NSK 6804DD which can operate at a maximum speed of 13000 rpmand a load limit of 4000N The tests were stopped whenthe amplitude of the vibration signal was higher than 20 gthus this moment was defined as the bearing failure time Anexample of bearing before and after the experiment is shownin Figure 7 together with the corresponding vibration signalcollected during the whole test

Table 4 reports how the 17 tested bearings were separatedinto the three operating conditions Moreover the durationof each experiment being the RUL to be predicted for eachbearing is also given We performed two sets of experimentsby considering respectively the bearings relative to thefirst and the second operating condition (ie Bearing1 1Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

As already mentioned the available data correspondto normally degraded bearings meaning that the defects

were not initially induced and that each degraded bearingcontains almost all the types of defects (balls rings and cage)resembling faithfully a common real industrial situationMoreover no assumption about the type of failure to beoccurred is provided with the data and since the variabilityin experiment durations is high (from 1 h to 7 h) performinggood estimates of the RUL is a difficult task [49]

In our experiments we considered as input to our modelthe horizontal channel of the accelerometerWe preprocessedthe raw signals by extracting two time-domain features thatis root mean square (RMS) and kurtosis within windowsof the same length as the given snapshots (119871 = 2560) Let119903119908(119905) be the raw signal of the 119908th window for each 119908 we

estimate RMS as 119909RMS119908

= radic(1119871)sum119871

119905=11199032119908(119905) and kurtosis as

119909KURT119908

= ((1119871)sum119871

119905=1(119903

119908(119905) minus 119903

119908)4)((1119871)sum

119871

119905=1(119903

119908(119905) minus 119903

119908)2)2

where 119903119908is the mean of 119903

119908 An example of feature extraction

for Bearing1 1 is shown in Figure 8To assess the performance of the proposed HSMM after

the model selection procedure we implemented a leave-one-out cross validation scheme by considering separatelyconditions 1 and 2 we performed the online RUL estimationfor each of the 7 bearings using an HSMM trained with theremaining 6 bearing histories Similarly to the simulated casewe considered the average absolute prediction error definedin (64) to quantitatively evaluate our method

522 Bearings RUL Estimation We performed our exper-iments in two steps firstly we applied model selection inorder to determine an optimalmodel structure and secondlywe estimated the RUL of the bearings The full procedure isdetailed in the following

(A) HSMM Structure To determine an appropriate HSMMstructure for effectively modeling the considered data weconsidered several HSMM structures characterized by (i)the duration distribution family (being Gaussian Gamma orWeibull) (ii) an increasing number of states119873 from 2 to 6and (iii) an increasing number of Gaussian mixtures 119872 inthe observation density from 1 to 4 For eachmodel structureobtained by systematically considering all the combinationsof (i) to (iii) we run 120 parameter learnings correspondingto 120 random initializations 1205820 on the data sets (Bearing1 1

Mathematical Problems in Engineering 17

times106

0 1 2 3 4 5 6 7

50

40

30

20

10

0

minus10

minus20

minus30

minus40

minus50

Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]

(b)(a)

0 280300

7

Extracted feature RMS

Time (s)

0 280300

60

Time (s)

Extracted feature kurtosis

Featureextraction

Win

dow

1

Win

dow

2

Win

dow

3

Win

dow

Win

dow

r(t)

x1 x2 x3

middot middot middot

middot middot middot

nminus1

n

xnminus1 xn

Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1

Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density

(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration

Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast

119894were estimated for the 119894th testing bear-

ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878

4as the failure state The same proce-

dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-

ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as

18 Mathematical Problems in Engineering

2 3 4 5 60

005

01

015

02

025

03

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(a) AIC values for Condition 1

2 3 4 5 60

01

02

03

04

05

06

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(b) AIC values for Condition 2

Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density

0 22590

22590

45000

Time (s)

True RULUpper RUL

Average RULLower RUL

(a) RUL estimation for Bearing1 7

True RULUpper RUL

Average RULLower RUL

0 70100

7010

19000

Time (s)

(b) RUL estimation for Bearing2 6

Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings

well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time

Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors

of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2

6 Conclusion and Future Work

In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation

The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required

Mathematical Problems in Engineering 19

Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds

(a) Condition 1

Test Bearings APEavg APElow APEup

Bearing1 1 105716 127230 94146Bearing1 2 43312 38156 38213Bearing1 3 29970 97309 60912Bearing1 4 63363 28766 148719Bearing1 5 19689 74484 104115Bearing1 6 42530 98964 97937Bearing1 7 13880 74943 100881Average 45494 77122 92132

(b) Condition 2

Test Bearings APEavg APElow APEup

Bearing2 1 24759 50065 72875Bearing2 2 16473 44972 82886Bearing2 3 88771 95083 79621Bearing2 4 17698 42486 49825Bearing2 5 86631 104900 107300Bearing2 6 8771 35047 66870Bearing2 7 30125 38664 66519Average 39033 58745 75128

parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology

This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications

As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension

Appendix

In this appendix we give the derivation of the state durationvariable introduced in (20) as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(119883

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A1)

The random variable 119889119905(119894) has been defined in Section 21

as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889

119905(119894) is sampled from an

arbitrary distribution

119889119905(119894) sim 119891 (119889) (A2)

We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as

P (119889119905(119894) = 119889) = P (119904

119905minus119889minus1= 119878

119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905 120582)

(A3)

We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889

119905(119894) of 119889

119905(119894) defined as its

expected value (see Equation (15))

119889119905(119894) = E (119889

119905(119894) | 119904

119905= 119878

119894 x

1x2sdot sdot sdot x

119905) 1 le 119894 le 119873 (A4)

From the definition of expectation we have

119889119905(119894) =

119905

sum

119889=1

119889 sdot P (119889119905(119894) = 119889)

=

119905

sum

119889=1

119889 sdot P (119904119905minus119889minus1

= 119878119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

(A5)

For 119889119905+1

(119894) we have

119889119905+1

(119894) =

119905+1

sum

119889=1

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119886)

(A6)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

(A7)

By noticing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

= (P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1))

sdot (P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1))

minus1

(A8)

20 Mathematical Problems in Engineering

we can replace the probability of the second termof (A7)with

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119887)

(A9)

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

(A10)

In the last factor of (A10) we can omit the information aboutthe current state and observation by observing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

asymp P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894| 119904

119905= 119878

119894 x

1 x

119905)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119888)

(A11)

if the following independencies hold

119904119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

119883119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

(A12)

wherewithperpwe denote independency Equation (A12) holdsforHMMs (evenwithout conditioning on x

1 x

119905) but they

do not hold for HSMMs since the state duration (expressedby 119904

119905minus119889+1 119904

119905minus1) determines the system evolution On

the other hand state duration is partially known by theobservtions x

1 x

119905Thus the approximation is reasonable

as long as the uncertainty on the states is within limitsFrom (A6) (A9) and (A11) we obtain

119889119905+1

(119894) = (119886) +

119905+1

sum

119889=2

119889 sdot (119887) sdot (119888)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

P(119860119861|119862)=P(119860|119861119862)sdotP(119861|119862)

+

119905+1

sum

119889=2

119889 sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

it does not depend on 119889

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

+ P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [

[

P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894119904

119905+1= 119878

119894 x

1 x

119905x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

for the approximation of (A11)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

]

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) +

119905

sum

1198891015840=1

(1198891015840+ 1)

sdot P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

(A13)

Noticing that

119905

sum

1198891015840=1

P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

+ P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) = 1

(A14)

because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows

119889119905+1

(119894) = P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1) sdot (119889

119905(119894) + 1)

(A15)

The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step

In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889

119905+1(119894)

we can consider the following equality

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=P (119904

119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

P (119904119905+1

= 119878119894| x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

120574119905+1

(119894)

(A16)

If we consider the terms involved in the probability at thenumerator of the right-hand side of (A16) we have that

x1 x

119905⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119861

perp x119905+1⏟⏟⏟⏟⏟⏟⏟

119862

| 119904119905= 119878

119894 119904

119905+1= 119878

119894⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119860

(A17)

Mathematical Problems in Engineering 21

If 119861 perp 119862 | 119860 for the Bayes rule we have that

P (119860 | 119862 119861) =P (119862 | 119860119861) sdot P (119860 | 119861)

P (119862 | 119861) (A18)

Hence we can rewrite the numerator of the right-hand sideof (A16) as follows

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

= (P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905)

sdot P(x119905+1

|

x119905+1

perp119904119905|119904119905+1

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞119904119905= 119878

119894 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

= (P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

sdot

120574119905(119894)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (119904

119905= 119878

119894| x

1 x

119905) sdot

119887119894(x119905+1

)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (x

119905+1| 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

(A19)

The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as

P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

= sum

119889119905

119886119894119894(d

119905) sdot P (119889

119905| x

1 x

119905)

asymp 119886119894119894(d

119905)

(A20)

while the denominator of (A19) can be expressed as follows

P (x119905+1

| x1 x

119905) =

P (x1 x

119905 x

119905+1)

P (x1 x

119905)

=sum

119873

119894=1120572119905+1

(119894)

sum119873

119894=1120572119905(119894)

(A21)

By substituting (A20) and (A21) in (A19) we obtain

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

sum119873

119894=1120572119905+1

(119894)

(A22)

and then by combining (A22) and (A16) we obtain

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

120574119905+1

(119894) sum119873

119894=1120572119905+1

(119894)

(A23)

Finally by substituting (A23) in (A15) and considering that

120574119905(119894) =

120572119905(119894)

sum119873

119894=1120572119905(119894)

(A24)

we derive the induction formula for 119889119905+1

(119894) in terms of modelparameters as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(x

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A25)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989

[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004

[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996

[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002

[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005

[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008

[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309

[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en

[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003

[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002

[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven

22 Mathematical Problems in Engineering

methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012

[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008

[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989

[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031

[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010

[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010

[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011

[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012

[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012

[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980

[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk

[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007

[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005

[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003

[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997

[26] S-Z Yu ldquoHidden semi-Markov modelsrdquo Artificial Intelligencevol 174 no 2 pp 215ndash243 2010

[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003

[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006

[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014

[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003

[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008

[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005

[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999

[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007

[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011

[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001

[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002

[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007

[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997

[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008

[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005

[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997

Mathematical Problems in Engineering 23

[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002

[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986

[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993

[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006

[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973

[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977

[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012

[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985

[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015

[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013

[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013

[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013

[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013

[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012

[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012

[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model

and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013

[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Hidden Semi-Markov Models for Predictive Maintenance

8 Mathematical Problems in Engineering

Concerning the observation parameters once the modi-fied forward and backward variables accounting for the stateduration are defined as in (22) and (28) the reestimationformulas are the same as for Hidden Markov Models [13]

In particular for continuous observations the parametersof the Gaussiansrsquo mixture defined in (9) are reestimated byfirstly defining the probability of being in state 119878

119895at time 119905

with the probability of the observation vector x119905evaluated by

the 119896th mixture component as

120574119905(119895 119896) = [

[

120572119905(119895) 120573

119905(119895)

sum119873

119895=1120572119905(119895) 120573

119905(119895)

]

]

sdot [

[

119888119895119896N (x

119905120583

119895119896U

119895119896)

sum119872

119898=1119888119895119898N (x

119905120583

119895119898U

119895119898)

]

]

(43)

By using the former quantity the parameters 119888119895119896 120583

119895119896 andU

119895119896

are reestimated through the following formulas

119888119895119896=

sum119879

119905=1120574119905(119895 119896)

sum119879

119905=1sum

119872

119898=1120574119905(119895 119898)

120583119895119896=sum

119879

119905=1120574119905(119895 119896) sdot x

119905

sum119879

119905=1120574119905(119895 119896)

U119895119896=sum

119879

119905=1120574119905(119895 119896) sdot (x

119905minus 120583

119895119896) (x

119905minus 120583

119895119896)119879

sum119879

119905=1120574119905(119895 119896)

(44)

where superscript 119879 denotes vector transposeFor discrete observations the reestimation formula for

the observation matrix 119887119895(119897) is

119887119895(119897) =

sum119879

119905=1with119909119905=119883119897

120574119905(119895)

sum119879

119905=1120574119905(119895)

(45)

where the quantity 120574119905(119895) which takes into account the dura-

tion dependent forward variable 120572119905(119895) is calculated through

(17)The reader is referred to Rabinerrsquos work [13] for the inter-

pretation on the observation parameters reestimation formu-las

3 AIC-Based Model Selection

In the framework of the proposed parametric HSMMs themodel selection procedure aims to select the optimal numberof hidden states119873 the right duration distribution family andin the case of mixture observation modeling the number ofGaussian mixtures 119872 to be used In this work we make useof the Akaike Information Criterion (AIC) Indeed it hasbeen seen that in case of complex models and in presence ofa limited number of training observations AIC represents asatisfactorymethodology formodel selection outperformingother approaches like Bayesian Information Criterion

In general information criteria are represented as a two-term structure They account for a compromise between

a measure of model fitness which is based on the likelihoodof the model and a penalty term which takes into accountthe model complexity Usually the model complexity ismeasured in terms of the number of parameters that have tobe estimated and in terms of the number of observations

The Akaike Information Criterion is an estimate of theasymptotic value of the expected distance between theunknown true likelihood function of the data and the fittedlikelihood function of the model In particular the AIC canbe expressed as

AIC =minus log 119871 () + 119901

119879 (46)

where 119871() is the likelihood of the model with the estimatedparameters as defined in (25) 119901 is the number of modelparameters and119879 is the length of the observed sequenceThebest model is the one minimizing equation (46)

Concerning 119901 the number of parameters to be estimatedfor a parametric HSMM with119873 states is 119901 = 119901

ℎ+ 119901

119900 where

119901ℎare the parameters of the hidden states layer while 119901

119900are

those of the observation layerIn particular 119901

ℎ= (119873 minus 1) + (119873 minus 1) sdot 119873 + 119911 sdot 119873 where

(i) 119873 minus 1 accounts for the prior probabilities 120587(ii) (119873 minus 1) sdot 119873 accounts for the nonrecurrent transition

matrix A0(iii) 119911sdot119873 accounts for the duration probability being 119911 the

number of parameters 120579 of the duration distribution

Concerning 119901119900 a distinction must be made between

discrete and continuous observations

(i) in the case of discrete observations with 119871 possibleobservable values 119901

119900= (119871 minus 1) sdot 119873 which accounts

for the elements of the observation matrix 119861(ii) if the observations are continuous and a multivariate

mixture of 119872 Gaussians with 119874 variates is used asobservationmodel 119901

119900= [119874 sdot119873sdot119872]+[119874 sdot119874 sdot119873sdot119872]+

[(119872minus 1) sdot 119873] where each term accounts respectivelyfor the mean vector 120583 the covariance matrix 119880 andthe mixture coefficients 119862

4 Remaining Useful Lifetime Estimation

One of the most important advantages of the time modelingof HSMMs is the possibility to effectively face the predictionproblem The knowledge of the state duration distributionsallows the estimation of the remaining time in a certain stateand in general the prediction of the expected time119863 beforeentering in a determinate state

As already mentioned an interesting application of theprediction problem is the Remaining Useful Lifetime (RUL)estimation of industrial equipments Indeed if each stateof an HSMM is mapped to a different condition of anindustrial machine and if the state 119878

119896that represents the

failure condition is identified at each moment the RUL canbe defined as the expected time 119863 to reach the failure state

Mathematical Problems in Engineering 9

119878119896 If we assume that the time to failure is a random variable

119863 following a determinate probability density we define theRUL at the current time 119905 as

RUL119905= 119863 = E (119863) 119904

119905+= 119878

119896 119904

119905+minus1= 119878

119894

1 le 119894 119896 le 119873 119894 = 119896

(47)

where E denotes the expected valueHaving fixed the failure state the estimation of the RUL

is performed in two steps every time a new observation isacquired (online)

(1) estimation of the current state(2) projection of the future state transitions until the

failure state is reached and estimation of the expectedsojourn time

The estimation of the current state is performed via theViterbi path that is the variable 120575

119905= [120575

119905(119894)]

1le119894le119873defined in

(29) To correctly model the uncertainty of the current stateestimation we use the normalized variable 120575

119905(119894) obtained as

120575119905(119894) = max

11990411199042119904119905minus1

P (119904119905= 119878

119894| 119904

11199042sdot sdot sdot 119904

119905minus1 x

1x2sdot sdot sdot x

119905 120582)

=120575119905(119894)

sum119873

119895=1120575119905(119895)

1 le 119894 le 119873

(48)

that is an estimate of the probability of being in state 119878119894at

time 119905Together with the normalized variable 120575

119905(119894) the maxi-

mum a posteriori estimate of the current state 119904lowast

119905is taken

into account according to (34) If 119904lowast119905coincides with the failure

state the desired event is detected by the model and the timeto this event is obviously zero Otherwise an estimation ofthe average remaining time in the current state 119889avg(119904

lowast

119905) is

calculated as

119889avg (119904lowast

119905) =

119873

sum

119894=1

(120583119889119894

minus 119889119905(119894)) ⊙ 120575

119905(119894) (49)

where with 120583119889119894

we denote the expected value of the durationvariable in state 119878

119894according to the duration distribution

specified by the parameters 120579119894 Equation (49) thus estimates

the remaining time in the current state by subtracting theestimated states duration 119889

119905(119894) at time 119905 from the expected

sojourn time of state 119878119894 and weighting the result using the

uncertainty about the current state 120575119905(119894) and finally by

summing up all the contributions from each stateIn addition to the average remaining time a lower and an

upper bound value can be calculated based on the standarddeviation 120590

119889119894

of the duration distribution for state 119878119894

119889low (119904lowast

119905) =

119873

sum

119894=1

(120583119889119894

minus 120590119889119894

minus 119889119905(119894)) ⊙ 120575

119905(119894) (50)

119889up (119904lowast

119905) =

119873

sum

119894=1

(120583119889119894

+ 120590119889119894

minus 119889119905(119894)) ⊙ 120575

119905(119894) (51)

Once the remaining time in the current state is estimatedthe probability of the next state is calculated by multiplyingthe transpose of the nonrecurrent transition matrix by thecurrent state probability estimation as follows

120575next = [120575119905+119889

(119894)]1le119894le119873

= (A0)119879

sdot 120575119905 (52)

while themaximumaposteriori estimate of the next state 119904lowastnextis calculated as

119904lowast

next = 119904lowast

119905+119889= argmax

1le119894le119873

120575119905+119889

(119894) (53)

Again if 119904lowast119905+119889

coincides with the failure state the failurewill happen after the remaining time at the current state isover and the average estimation of the failure time is 119863avg =

119889avg(119904lowast

119905) calculated at the previous step with the bound values

119863low = 119889low(119904lowast

119905) and119863up = 119889up(119904

lowast

119905) Otherwise the estimation

of the sojourn time of the next state is calculated as follows

119889avg (119904lowast

119905+119889) =

119873

sum

119894=1

120583119889119894

⊙ 120575119905+119889

(119894) (54)

119889low (119904lowast

119905+119889) =

119873

sum

119894=1

(120583119889119894

minus 120590119889119894

) ⊙ 120575119905+119889

(119894) (55)

119889up (119904lowast

119905+119889) =

119873

sum

119894=1

(120583119889119894

+ 120590119889119894

) ⊙ 120575119905+119889

(119894) (56)

This procedure is repeated until the failure state isencountered in the prediction of the next state The calcula-tion of the RUL is then simply obtained by summing all theestimated remaining time in each intermediate state beforeencountering the failure state

119863avg = sum119889avg (57)

119863low = sum119889low (58)

119863up = sum119889up (59)

Finally Algorithm 1 details the above described RUL esti-mation procedure

5 Experimental Results

To demonstrate the effectiveness of the proposed HSMMmodels we make use of a series of experiments performedboth on simulated and real data

The simulated data were generated by considering a left-right HSMM and adapting the parameters of the artificialexample reported in the work of Lee et al [15] The real casedata are monitoring data related to the entire operationallife of bearings made available for the IEEE PHM 2012 datachallenge (httpwwwfemto-stfrenResearch-departmentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-chal-lengephp)

10 Mathematical Problems in Engineering

(1) function RulEstimation(x119905 119878

119896) ⊳ x

119905 The last observation acquired

(2) ⊳ 119878119896 The failure state

(3) Initialization(4) 119863avg larr 0

(5) 119863low larr 0

(6) 119863up larr 0

(7) Current state estimation(8) Calculate 120575

119905⊳ Using (48)

(9) Calculate 119904lowast119905

⊳ Using (34)(10) Calculate d

119905⊳ Using (20)

(11) 119878 larr 119904lowast

119905

(12) Loop(13) while 119878 = 119878

119896do

(14) Calculate 119889avg ⊳ Using (49) or (54)(15) Calculate 119889low ⊳ Using (50) or (55)(16) Calculate 119889up ⊳ Using (51) or (56)(17) 119863avg larr 119863avg + 119889avg

(18) 119863low larr 119863low + 119889low(19) 119863up larr 119863up + 119889up(20) Calculate 120575next ⊳ Using (52)(21) Calculate 119904lowastnext ⊳ Using (53)(22) 119878 larr 119904

lowast

nextend

(23) return 119863avg119863low119863up

Algorithm 1 Remaining Useful Lifetime estimation (Pseudo-Code)

51 Simulated Experiment Data have been generated withthe idea of simulating the behavior of an industrial machinethat during its functioning experiences several degradationmodalities until a failure state is reached at the end of itslifetimeThe generated data are used to test the performancesof our methodology for (i) automatic model selection (ii)online condition monitoring and (iii) Remaining UsefulLifetime estimation considering both continuous and dis-crete observations

511 Data Generation The industrial machine subject ofthese experiments has been modeled as a left-right paramet-ric HSMM with 119873 = 5 states having state 119878

5as absorbing

(failure) stateThe choice of a left-right setting has beenmadefor simplicity reasons since the primary goal of this work is todemonstrate that the proposed model specification coupledwith the Akaike Information Criterion is effective to solveautomatic model selection online condition monitoringand prediction problems At this purpose we divided theexperiments in two cases according to the nature of theobservation being continuous or discrete

For each of the continuous and the discrete cases threedata sets have been generated by considering the followingduration models Gaussian Gamma and Weibull densitiesFor each of the three data sets 30 series of data are usedas training set and 10 series as testing set Each time seriescontains 119879 = 650 observations The parameters used togenerate the data are taken from the work of Lee et al [15]

and are adapted to obtain an equivalent left-right parametricHSMM as follows

120587 =

[[[[[

[

1

0

0

0

0

]]]]]

]

A0=

[[[[[

[

0 1 0 0 0

0 0 1 0 0

0 0 0 1 0

0 0 0 0 1

0 0 0 0 1

]]]]]

]

ΘN = 1205791= [100 20] 120579

2= [90 15] 120579

3= [100 20]

1205794= [80 25] 120579

5= [200 1]

ΘG = 1205791= [500 02] 120579

2= [540 01667]

1205793= [500 02] 120579

4= [256 03125]

1205795= [800 0005]

ΘW = 1205791= [102 28] 120579

2= [92 29] 120579

3= [102 28]

1205794= [82 20] 120579

5= [200 256]

(60)

whereΘNΘG andΘW are the different distribution param-eters used for the Gaussian Gamma and Weibull dura-tion models respectively More precisely they represent thevalues of the mean 120583

119889and the the variance 120590

2

119889of the

Gaussian distribution the shape ]119889and the scale 120578

119889of the

Gamma distribution and the scale 119886119889and the shape 119887

119889of

Mathematical Problems in Engineering 11

0 100 200 300 400 500 600

12345

Hidden states sequence

0 100 200 300 400 500 6000

100

200

300 State duration

0 100 200 300 400 500 6000

20

40

60Observed signal

(a) Example of simulated data for the continuous case

0 100 200 300 400 500 600

12345

Hidden states sequence

0 100 200 300 400 500 6000

100

200

300 State duration

0 100 200 300 400 500 600

132

54

76

Observed symbols

(b) Example of simulated data for the discrete case

Figure 2 The data generated with the parameter described in Section 511 both for the continuous and the discrete case

theWeibull distribution It must be noticed that as explainedin Section 21 for state 119878

5 being the absorbing state the

duration parameters 1205795have no influence on the data since

once the state 1198785is reached the system will remain there

foreverConcerning the continuous observation modeling a

bivariate Gaussian distribution has been used with the fol-lowing parameters [15]

1205831= [

20

20] 120583

2= [

20

35] 120583

3= [

35

35]

1205835= [

28

28]

1198801= [

20 0

0 20] 119880

2= [

15 0

0 15] 119880

3= [

15 minus2

minus2 15]

1198804= [

5 0

0 5] 119880

5= [

10 3

3 10]

(61)

while for the discrete case 119871 = 7 distinct observation sym-bols have been taken into consideration with the followingobservation probability distribution

119861 =

[[[[[

[

08 02 0 0 0 0 0

01 08 01 0 0 0 0

0 01 08 01 0 0 0

0 0 01 07 01 01 0

0 0 0 02 06 01 01

]]]]]

]

(62)

An example of simulated data both for the continuous and thediscrete cases is shown in Figure 2 where aGaussian durationmodel has been used

512 Training and Model Selection The goal of this experi-mental phase is to test the effectiveness of the AIC in solvingthe automatic model selection For this purpose the trainingsets of the 6 data sets (continuousdiscrte observation withGaussian Gamma and Weibull duration models) have beentaken individually and for each one of them a series of

learning procedure has been run each one with a variableHSMM structure In particular we took into account all thecombinations of the duration distribution families (GaussianGamma and Weibull) an increasing number of states 119873from 2 to 8 and for the continuous observation cases anincreasing number of Gaussian mixtures119872 in the observa-tion distribution from 1 to 4

As accurate parameter initialization is crucial for obtain-ing a good model fitting [14] a series of 40 learning proce-dures corresponding to 40 random initializations of the initialparameters 1205820 have been executed for each of the consideredHSMM structures For each model structure the AIC valueas defined in (46) has been evaluatedThe final trained set ofparameters 120582lowast corresponding to the minimumAIC value hasbeen retained resulting in 7 HSMMs with a number of statesfrom 2 to 8

The obtained results are shown in Figure 3 for boththe continuous and discrete observation data As it can benoticed for all the 6 test cases of Figure 3 the AIC valuesdo not improve much for a number of states higher than 5meaning that adding more states does not add considerableinformation to the HSMM modeling power Hence 5 statescan be considered as an optimal number of states Moreoveras shown in the zoomed sections of Figure 3 for the HSMMswith 5 states the minimum AIC values are obtained for theduration distributions corresponding to the ones used togenerate the data As a consequence AIC can be considered asan effective approach to performmodel selection for HSMMas well as selecting the appropriate parametric distributionfamily for the state duration modeling

513 Condition Monitoring The optimal parameters 120582lowast

obtained in the previous phase have been used to performthe online conditionmonitoring experiment on the 10 testingcases for all the 6 considered HSMM configurations In thisexperiment we simulate online monitoring by consideringall the testing observations up to the current time thatis x

1x2sdot sdot sdot x

119905 Each time a new data point is acquired

the Viterbi algorithm is used to estimate the current state119904lowast

119905= argmax

1le119894le119873[120575

119905(119894)] as specified in (34)

12 Mathematical Problems in Engineering

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(a) AIC values for continuous data and Gaussian durationdistribution

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(b) AIC values for continuous data and Gamma duration distri-bution

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(c) AIC values for continuous data and Weibull duration distri-bution

2 3 4 5 6 7 8Number of states

0

02

04

06

08

1

12

14

(d) AIC values for discrete data and Gaussian duration distribu-tion

2 3 4 5 6 7 8Number of states

0

02

04

06

08

1

12

14

GaussianGamma

Weibull

(e) AIC values for discrete data andGammaduration distribution

2 3 4 5 6 7 8Number of states

GaussianGamma

Weibull

0

02

04

06

08

1

12

14

(f) AIC values for discrete data andWeibull duration distribution

Figure 3 Akaike Information Criterion (AIC) applied to continuous and discrete observations data AIC is effective for automatic modelselection since its minimum value provides the same number of states and duration model used to generate the data

An example of execution of the condition monitoringexperiment is shown in Figure 4 for both continuous anddiscrete observations respectively In Figure 4(a) the stateduration has beenmodeledwith aGammadistribution whilein Figure 4(b) with a Gaussian distribution In Figures 4(a)and 4(b) the first display represents the true state of the

HSMM and the second display represents the estimated statefrom the Viterbi algorithm while the third display representsthe observed time series

Knowing the true state sequence we calculated theaccuracy defined as the percentage of correctly estimatedstates over the total length of the state sequence for each of

Mathematical Problems in Engineering 13

0 100 200 300 400 500 600 70012345

True state sequence

0 100 200 300 400 500 600 70012345

Viterbi path accuracy 0985

0 100 200 300 400 500 600 7000

204060 Observations

Correct guessWrong guess

(a) State estimation with Viterbi path for continuous data andGamma duration distribution

0 100 200 300 400 500 600 70012345

True state sequence

0 100 200 300 400 500 600 70012345

Viterbi path accuracy 0992

Correct guessWrong guess

0 100 200 300 400 500 600 700246

Observations

(b) State estimation with Viterbi path for discrete data andGaussian duration distribution

Figure 4 Condition monitoring using the Viterbi path HSMMs can be effective to solve condition monitoring problems in time-dependentapplications due to their high accuracy in hidden state recognition

the testing casesThe results are summarized in Table 1(a) forthe continuous observations and in Table 1(b) for the discreteobservations The high percentage of correct classified statesshows that HSMMs can be effectively used to solve conditionmonitoring problems for applications in which the systemshows a strong time and state duration dependency

514 Remaining Useful Lifetime Estimation In this experi-mental phasewe considered the state 119878

5as the failure state and

the trained parameters 120582lowast of Section 512 for each HSMMconfiguration As the online RUL estimation procedure isintended to be used in real time we simulated conditionmonitoring experiment where we progressively consider theobservations x

1x2sdot sdot sdot x

119905 up to time 119905 When a new obser-

vation is acquired after the current state probability 120575119905(119894)

is estimated (Equation (48)) the calculation of the averageupper and lower RUL ((57) (58) and (59)) is performed

Examples of RUL estimation are illustrated in Figure 5In particular Figure 5(a) represents the case of continuousobservations and durationmodeled by aWeibull distributionwhile Figure 5(b) shows the case of discrete observationsand duration modeled by a Gamma distribution From thefigures one can notice that the average as well as the lowerand the upper bound estimations converges to the real RULas the failure time approaches Moreover as expected theuncertainty about the estimation decreases with the timesince predictions performed at an early stage are moreimprecise As a consequence the upper and the lower boundbecome more narrow as the failure state approaches and theestimation becomes more precise until it converges to theactual RUL value with the prediction error tending to zeroat the end of the evaluation

To quantitatively estimate the performance of our meth-odology for the RUL estimation we considered at each time119905 the absolute prediction error (APE) between the real RULand the predicted value defined as

APE (119905) = 1003816100381610038161003816RULreal (119905) minus RUL (119905)1003816100381610038161003816 (63)

where RULreal(119905) is the (known) value of the RUL at time119905 while RUL(119905) is RUL predicted by the model To evaluatethe overall performance of our methodology we consideredthe average absolute prediction error of the RUL estimationdefined as

APE =sum

119879

119905=1APE (119905)119879

(64)

where 119879 is the length of the testing signal APE being aprediction error average values of (64) close to zero corre-spond to good predictive performances

The results for each of the 10 testing cases and the differentHSMM configurations are reported in Tables 2(a) and 2(b)for the continuous and the discrete observation cases respec-tively As it can be noticed the online Remaining UsefulLifetime estimation and in general the online prediction ofthe time to a certain event can be effectively faced withHSMMs which achieve a reliable estimation power with asmall prediction error

Finally we tested our RUL estimationmethodology usingthe state duration estimation of (16) introduced byAzimi et al[30ndash32] The results are shown in Tables 3(a) and 3(b)in which respectively the prediction errors obtained forcontinuous and discrete observations are reported

Comparing Table 2 and Table 3 one can notice that theproposed RUL method outperforms the one of Azimi This

14 Mathematical Problems in Engineering

0 50 100 150 200 250 300 3500

50

100

150

200

250

300

350

400

Time

True RULUpper RUL

Average RULLower RUL

(a) Remaining Useful Lifetime estimation for continuous data andWeibull duration distribution

0 50 100 150 200 250 300 3500

50

100

150

200

250

300

350

400

Time

True RULUpper RUL

Average RULLower RUL

(b) Remaining Useful Lifetime estimation for discrete data andGamma duration distribution

Figure 5 HSMMs effectively solve RUL estimation problemsThe prediction converges to the actual RUL value and its uncertainty decreasesas the real failure time approaches

Table 1 State recognition accuracy

(a) Continuous observations

Test case Duration distributionGaussian Gamma Weibull

1 994 985 9922 997 986 9953 994 992 9974 989 989 9975 982 989 1006 991 988 9977 985 994 9978 992 991 9959 992 986 99710 992 991 995Average 991 989 996

(b) Discrete observations

Test case Duration distributionGaussian Gamma Weibull

1 974 967 9742 972 976 9653 994 958 9664 982 953 9775 991 974 9756 978 977 9787 958 972 9668 977 964 9729 989 972 98510 992 956 969Average 981 967 973

is mainly due to the proposed average state duration of (20)compared to the one of Azimi given by (16)

52 Real Data In this sectionwe apply the proposedHSMM-based approach for RUL estimation to a real case studyusing bearing monitoring data recorded during experimentscarried out on the Pronostia experimental platform andmadeavailable for the IEEE Prognostics and Health Management(PHM) 2012 challenge [49]The data correspond to normallydegraded bearings leading to cases which closely correspondto the industrial reality

The choice of testing the proposed methodology onbearings derives from two facts (i) bearings are the mostcritical components related to failures of rotating machines[50] and (ii) their monotonically increasing degradationpattern justifies the usage of left-right HSMMmodels

521 Data Description Pronostia is an experimental plat-form designed and realized at the Automatic Controland Micro-Mechatronic Systems (AS2M) Department ofFranche-Comte Electronics Mechanics Thermal Process-ing Optics-Science and Technology (FEMTO-ST) Institute(httpwwwfemto-stfr) (Besancon France) with the aimof collecting real data related to accelerated degradation ofbearings Such data are used to validate methods for bearingcondition assessment diagnostic and prognostic [19 51ndash59]

The Pronostia platform allows to perform run-to-failuretests under constant or variable operating conditions Theoperating conditions are determined by two factors thatcan be controlled online (i) the rotation speed and (ii)the radial force load During each experiment tempera-ture and vibration monitoring measurements are gatheredonline through two type of sensors placed in the bearing

Mathematical Problems in Engineering 15

Table 2 Average absolute prediction error (APE) of the RUL esti-mation using the proposed state duration estimator of (20)

(a) APE of the RUL estimation for the continuous observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 51 170 67 140 290 091 45 170 812 76 190 50 61 210 85 66 190 613 77 54 190 29 120 170 160 290 304 90 210 29 75 220 68 60 190 675 73 190 47 22 140 140 39 170 876 65 180 56 51 180 100 140 270 277 47 160 75 48 170 110 12 130 1208 100 220 29 52 180 100 92 220 399 31 92 140 20 160 130 82 210 4910 64 180 56 75 220 69 33 120 130Average 68 170 74 57 190 99 73 200 70

(b) APE of the RUL estimation for the discrete observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 21 110 140 31 88 140 24 120 1302 21 110 130 110 220 33 190 320 713 51 170 76 66 180 51 23 140 1104 59 65 180 52 170 67 42 160 905 32 140 100 83 190 34 120 240 296 120 240 27 62 180 52 41 84 1607 29 150 97 93 210 23 190 310 668 150 270 70 74 180 43 43 170 949 59 180 77 110 230 55 39 160 8810 35 110 140 55 60 160 52 170 71Average 57 150 100 74 170 66 77 190 90

housing a temperature probe and two accelerometers (oneon the vertical and one on the horizontal axis)

The platform is composed of three main parts a rotatingpart a load profile generation part and a measurement partas illustrated in Figure 6

The rotating part is composed of an asynchronous motorwhich develops a power equal to 250W two shafts and agearbox which allows the motor to reach its rated speed of2830 rpm The motorrsquos rotation speed and the direction areset through a human machine interface

The load profiles part issues a radial force on the externalring of the test bearing through a pneumatic jack connectedto a lever arm which indirectly transmits the load througha clamping ring The goal of the applied radial force is toaccelerate the bearingrsquos degradation process

Table 3 Average absolute prediction error (APE) of the RUL esti-mation using the state duration estimator of (16) introduced byAzimi et al [30ndash32]

(a) APE of the RUL estimation for the continuous observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 578 510 668 262 97 527 259 284 6462 502 444 577 213 170 469 290 192 7083 503 447 573 271 87 565 345 139 7344 518 460 604 213 143 459 349 171 7875 594 537 662 290 95 554 334 156 7496 580 517 671 258 83 541 231 258 6657 594 536 669 182 125 477 360 171 7448 634 556 723 194 157 441 348 178 7709 491 435 570 145 171 432 251 267 67010 544 484 628 232 79 527 241 245 674Average 554 493 635 226 121 499 301 206 715

(b) APE of the RUL estimation for the discrete observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 514 410 624 424 318 530 326 264 7362 496 399 604 595 483 708 313 276 6933 502 386 623 465 357 574 324 257 7024 422 315 538 501 405 606 237 361 6035 443 339 558 478 374 591 360 256 7656 522 432 627 552 443 669 272 316 6437 550 439 668 560 457 670 347 232 7448 503 390 620 604 505 710 351 264 7249 555 474 640 480 372 595 318 222 73610 490 382 607 521 412 631 294 289 687Average 500 397 611 518 413 629 314 274 704

Themeasurement part consists of a data acquisition cardconnected to the monitoring sensors which provides theuser with the measured temperature and vibration data Thevibration measurements are provided in snapshots of 01 scollected each 10 seconds at a sampling frequency of 256 kHz(2560 samples per each snapshot) while the temperature hasbeen continuously recorded at a sampling frequency of 10Hz(600 samples collected each minute)

Further details on the Pronostia test rig can be found onthe data presentation paper [49] and on the web page of thedata challenge (httpwwwfemto-stfrenResearch-depart-mentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-challengephp)

16 Mathematical Problems in Engineering

Table 4 Lifetime duration (in seconds) and operating conditions of the bearings tested in the IEEE PHM 2012 Prognostic Challenge dataset[49]

Condition 1 Condition 2 Condition 31800 rpm and 4000N 1650 rpm and 4200N 1500 rpm and 5000N

Bearing Lifetime [s] Bearing Lifetime [s] Bearing Lifetime [s]Bearing1 1 28030 Bearing2 1 9110 Bearing3 1 5150Bearing1 2 8710 Bearing2 2 7970 Bearing3 2 16370Bearing1 3 23750 Bearing2 3 19550 Bearing3 3 4340Bearing1 4 14280 Bearing2 4 7510Bearing1 5 24630 Bearing2 5 23110Bearing1 6 24480 Bearing2 6 7010Bearing1 7 22590 Bearing2 7 2300

Loadmodule

Testedbearing

Dataacquisitionmodule

Rotatingmodule

Figure 6 Global overview of the Pronostia experimental platform[19]

Regarding the data provided for the PHM 2012 challenge3 different operating conditions were considered

(i) first operating conditions speed of 1800 rpm and loadof 4000 Newton

(ii) second operating conditions speed of 1650 rpm andload of 4200 Newton

(iii) third operating conditions speed of 1500 rpm andload of 5000 Newton

Under the above operating conditions a total of 17 accel-erated life tests were realized on bearings of type NSK 6804DD which can operate at a maximum speed of 13000 rpmand a load limit of 4000N The tests were stopped whenthe amplitude of the vibration signal was higher than 20 gthus this moment was defined as the bearing failure time Anexample of bearing before and after the experiment is shownin Figure 7 together with the corresponding vibration signalcollected during the whole test

Table 4 reports how the 17 tested bearings were separatedinto the three operating conditions Moreover the durationof each experiment being the RUL to be predicted for eachbearing is also given We performed two sets of experimentsby considering respectively the bearings relative to thefirst and the second operating condition (ie Bearing1 1Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

As already mentioned the available data correspondto normally degraded bearings meaning that the defects

were not initially induced and that each degraded bearingcontains almost all the types of defects (balls rings and cage)resembling faithfully a common real industrial situationMoreover no assumption about the type of failure to beoccurred is provided with the data and since the variabilityin experiment durations is high (from 1 h to 7 h) performinggood estimates of the RUL is a difficult task [49]

In our experiments we considered as input to our modelthe horizontal channel of the accelerometerWe preprocessedthe raw signals by extracting two time-domain features thatis root mean square (RMS) and kurtosis within windowsof the same length as the given snapshots (119871 = 2560) Let119903119908(119905) be the raw signal of the 119908th window for each 119908 we

estimate RMS as 119909RMS119908

= radic(1119871)sum119871

119905=11199032119908(119905) and kurtosis as

119909KURT119908

= ((1119871)sum119871

119905=1(119903

119908(119905) minus 119903

119908)4)((1119871)sum

119871

119905=1(119903

119908(119905) minus 119903

119908)2)2

where 119903119908is the mean of 119903

119908 An example of feature extraction

for Bearing1 1 is shown in Figure 8To assess the performance of the proposed HSMM after

the model selection procedure we implemented a leave-one-out cross validation scheme by considering separatelyconditions 1 and 2 we performed the online RUL estimationfor each of the 7 bearings using an HSMM trained with theremaining 6 bearing histories Similarly to the simulated casewe considered the average absolute prediction error definedin (64) to quantitatively evaluate our method

522 Bearings RUL Estimation We performed our exper-iments in two steps firstly we applied model selection inorder to determine an optimalmodel structure and secondlywe estimated the RUL of the bearings The full procedure isdetailed in the following

(A) HSMM Structure To determine an appropriate HSMMstructure for effectively modeling the considered data weconsidered several HSMM structures characterized by (i)the duration distribution family (being Gaussian Gamma orWeibull) (ii) an increasing number of states119873 from 2 to 6and (iii) an increasing number of Gaussian mixtures 119872 inthe observation density from 1 to 4 For eachmodel structureobtained by systematically considering all the combinationsof (i) to (iii) we run 120 parameter learnings correspondingto 120 random initializations 1205820 on the data sets (Bearing1 1

Mathematical Problems in Engineering 17

times106

0 1 2 3 4 5 6 7

50

40

30

20

10

0

minus10

minus20

minus30

minus40

minus50

Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]

(b)(a)

0 280300

7

Extracted feature RMS

Time (s)

0 280300

60

Time (s)

Extracted feature kurtosis

Featureextraction

Win

dow

1

Win

dow

2

Win

dow

3

Win

dow

Win

dow

r(t)

x1 x2 x3

middot middot middot

middot middot middot

nminus1

n

xnminus1 xn

Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1

Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density

(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration

Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast

119894were estimated for the 119894th testing bear-

ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878

4as the failure state The same proce-

dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-

ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as

18 Mathematical Problems in Engineering

2 3 4 5 60

005

01

015

02

025

03

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(a) AIC values for Condition 1

2 3 4 5 60

01

02

03

04

05

06

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(b) AIC values for Condition 2

Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density

0 22590

22590

45000

Time (s)

True RULUpper RUL

Average RULLower RUL

(a) RUL estimation for Bearing1 7

True RULUpper RUL

Average RULLower RUL

0 70100

7010

19000

Time (s)

(b) RUL estimation for Bearing2 6

Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings

well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time

Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors

of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2

6 Conclusion and Future Work

In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation

The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required

Mathematical Problems in Engineering 19

Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds

(a) Condition 1

Test Bearings APEavg APElow APEup

Bearing1 1 105716 127230 94146Bearing1 2 43312 38156 38213Bearing1 3 29970 97309 60912Bearing1 4 63363 28766 148719Bearing1 5 19689 74484 104115Bearing1 6 42530 98964 97937Bearing1 7 13880 74943 100881Average 45494 77122 92132

(b) Condition 2

Test Bearings APEavg APElow APEup

Bearing2 1 24759 50065 72875Bearing2 2 16473 44972 82886Bearing2 3 88771 95083 79621Bearing2 4 17698 42486 49825Bearing2 5 86631 104900 107300Bearing2 6 8771 35047 66870Bearing2 7 30125 38664 66519Average 39033 58745 75128

parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology

This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications

As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension

Appendix

In this appendix we give the derivation of the state durationvariable introduced in (20) as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(119883

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A1)

The random variable 119889119905(119894) has been defined in Section 21

as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889

119905(119894) is sampled from an

arbitrary distribution

119889119905(119894) sim 119891 (119889) (A2)

We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as

P (119889119905(119894) = 119889) = P (119904

119905minus119889minus1= 119878

119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905 120582)

(A3)

We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889

119905(119894) of 119889

119905(119894) defined as its

expected value (see Equation (15))

119889119905(119894) = E (119889

119905(119894) | 119904

119905= 119878

119894 x

1x2sdot sdot sdot x

119905) 1 le 119894 le 119873 (A4)

From the definition of expectation we have

119889119905(119894) =

119905

sum

119889=1

119889 sdot P (119889119905(119894) = 119889)

=

119905

sum

119889=1

119889 sdot P (119904119905minus119889minus1

= 119878119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

(A5)

For 119889119905+1

(119894) we have

119889119905+1

(119894) =

119905+1

sum

119889=1

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119886)

(A6)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

(A7)

By noticing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

= (P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1))

sdot (P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1))

minus1

(A8)

20 Mathematical Problems in Engineering

we can replace the probability of the second termof (A7)with

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119887)

(A9)

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

(A10)

In the last factor of (A10) we can omit the information aboutthe current state and observation by observing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

asymp P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894| 119904

119905= 119878

119894 x

1 x

119905)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119888)

(A11)

if the following independencies hold

119904119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

119883119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

(A12)

wherewithperpwe denote independency Equation (A12) holdsforHMMs (evenwithout conditioning on x

1 x

119905) but they

do not hold for HSMMs since the state duration (expressedby 119904

119905minus119889+1 119904

119905minus1) determines the system evolution On

the other hand state duration is partially known by theobservtions x

1 x

119905Thus the approximation is reasonable

as long as the uncertainty on the states is within limitsFrom (A6) (A9) and (A11) we obtain

119889119905+1

(119894) = (119886) +

119905+1

sum

119889=2

119889 sdot (119887) sdot (119888)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

P(119860119861|119862)=P(119860|119861119862)sdotP(119861|119862)

+

119905+1

sum

119889=2

119889 sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

it does not depend on 119889

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

+ P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [

[

P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894119904

119905+1= 119878

119894 x

1 x

119905x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

for the approximation of (A11)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

]

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) +

119905

sum

1198891015840=1

(1198891015840+ 1)

sdot P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

(A13)

Noticing that

119905

sum

1198891015840=1

P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

+ P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) = 1

(A14)

because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows

119889119905+1

(119894) = P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1) sdot (119889

119905(119894) + 1)

(A15)

The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step

In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889

119905+1(119894)

we can consider the following equality

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=P (119904

119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

P (119904119905+1

= 119878119894| x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

120574119905+1

(119894)

(A16)

If we consider the terms involved in the probability at thenumerator of the right-hand side of (A16) we have that

x1 x

119905⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119861

perp x119905+1⏟⏟⏟⏟⏟⏟⏟

119862

| 119904119905= 119878

119894 119904

119905+1= 119878

119894⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119860

(A17)

Mathematical Problems in Engineering 21

If 119861 perp 119862 | 119860 for the Bayes rule we have that

P (119860 | 119862 119861) =P (119862 | 119860119861) sdot P (119860 | 119861)

P (119862 | 119861) (A18)

Hence we can rewrite the numerator of the right-hand sideof (A16) as follows

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

= (P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905)

sdot P(x119905+1

|

x119905+1

perp119904119905|119904119905+1

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞119904119905= 119878

119894 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

= (P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

sdot

120574119905(119894)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (119904

119905= 119878

119894| x

1 x

119905) sdot

119887119894(x119905+1

)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (x

119905+1| 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

(A19)

The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as

P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

= sum

119889119905

119886119894119894(d

119905) sdot P (119889

119905| x

1 x

119905)

asymp 119886119894119894(d

119905)

(A20)

while the denominator of (A19) can be expressed as follows

P (x119905+1

| x1 x

119905) =

P (x1 x

119905 x

119905+1)

P (x1 x

119905)

=sum

119873

119894=1120572119905+1

(119894)

sum119873

119894=1120572119905(119894)

(A21)

By substituting (A20) and (A21) in (A19) we obtain

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

sum119873

119894=1120572119905+1

(119894)

(A22)

and then by combining (A22) and (A16) we obtain

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

120574119905+1

(119894) sum119873

119894=1120572119905+1

(119894)

(A23)

Finally by substituting (A23) in (A15) and considering that

120574119905(119894) =

120572119905(119894)

sum119873

119894=1120572119905(119894)

(A24)

we derive the induction formula for 119889119905+1

(119894) in terms of modelparameters as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(x

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A25)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989

[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004

[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996

[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002

[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005

[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008

[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309

[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en

[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003

[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002

[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven

22 Mathematical Problems in Engineering

methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012

[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008

[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989

[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031

[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010

[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010

[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011

[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012

[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012

[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980

[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk

[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007

[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005

[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003

[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997

[26] S-Z Yu ldquoHidden semi-Markov modelsrdquo Artificial Intelligencevol 174 no 2 pp 215ndash243 2010

[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003

[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006

[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014

[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003

[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008

[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005

[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999

[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007

[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011

[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001

[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002

[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007

[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997

[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008

[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005

[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997

Mathematical Problems in Engineering 23

[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002

[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986

[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993

[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006

[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973

[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977

[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012

[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985

[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015

[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013

[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013

[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013

[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013

[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012

[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012

[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model

and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013

[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Hidden Semi-Markov Models for Predictive Maintenance

Mathematical Problems in Engineering 9

119878119896 If we assume that the time to failure is a random variable

119863 following a determinate probability density we define theRUL at the current time 119905 as

RUL119905= 119863 = E (119863) 119904

119905+= 119878

119896 119904

119905+minus1= 119878

119894

1 le 119894 119896 le 119873 119894 = 119896

(47)

where E denotes the expected valueHaving fixed the failure state the estimation of the RUL

is performed in two steps every time a new observation isacquired (online)

(1) estimation of the current state(2) projection of the future state transitions until the

failure state is reached and estimation of the expectedsojourn time

The estimation of the current state is performed via theViterbi path that is the variable 120575

119905= [120575

119905(119894)]

1le119894le119873defined in

(29) To correctly model the uncertainty of the current stateestimation we use the normalized variable 120575

119905(119894) obtained as

120575119905(119894) = max

11990411199042119904119905minus1

P (119904119905= 119878

119894| 119904

11199042sdot sdot sdot 119904

119905minus1 x

1x2sdot sdot sdot x

119905 120582)

=120575119905(119894)

sum119873

119895=1120575119905(119895)

1 le 119894 le 119873

(48)

that is an estimate of the probability of being in state 119878119894at

time 119905Together with the normalized variable 120575

119905(119894) the maxi-

mum a posteriori estimate of the current state 119904lowast

119905is taken

into account according to (34) If 119904lowast119905coincides with the failure

state the desired event is detected by the model and the timeto this event is obviously zero Otherwise an estimation ofthe average remaining time in the current state 119889avg(119904

lowast

119905) is

calculated as

119889avg (119904lowast

119905) =

119873

sum

119894=1

(120583119889119894

minus 119889119905(119894)) ⊙ 120575

119905(119894) (49)

where with 120583119889119894

we denote the expected value of the durationvariable in state 119878

119894according to the duration distribution

specified by the parameters 120579119894 Equation (49) thus estimates

the remaining time in the current state by subtracting theestimated states duration 119889

119905(119894) at time 119905 from the expected

sojourn time of state 119878119894 and weighting the result using the

uncertainty about the current state 120575119905(119894) and finally by

summing up all the contributions from each stateIn addition to the average remaining time a lower and an

upper bound value can be calculated based on the standarddeviation 120590

119889119894

of the duration distribution for state 119878119894

119889low (119904lowast

119905) =

119873

sum

119894=1

(120583119889119894

minus 120590119889119894

minus 119889119905(119894)) ⊙ 120575

119905(119894) (50)

119889up (119904lowast

119905) =

119873

sum

119894=1

(120583119889119894

+ 120590119889119894

minus 119889119905(119894)) ⊙ 120575

119905(119894) (51)

Once the remaining time in the current state is estimatedthe probability of the next state is calculated by multiplyingthe transpose of the nonrecurrent transition matrix by thecurrent state probability estimation as follows

120575next = [120575119905+119889

(119894)]1le119894le119873

= (A0)119879

sdot 120575119905 (52)

while themaximumaposteriori estimate of the next state 119904lowastnextis calculated as

119904lowast

next = 119904lowast

119905+119889= argmax

1le119894le119873

120575119905+119889

(119894) (53)

Again if 119904lowast119905+119889

coincides with the failure state the failurewill happen after the remaining time at the current state isover and the average estimation of the failure time is 119863avg =

119889avg(119904lowast

119905) calculated at the previous step with the bound values

119863low = 119889low(119904lowast

119905) and119863up = 119889up(119904

lowast

119905) Otherwise the estimation

of the sojourn time of the next state is calculated as follows

119889avg (119904lowast

119905+119889) =

119873

sum

119894=1

120583119889119894

⊙ 120575119905+119889

(119894) (54)

119889low (119904lowast

119905+119889) =

119873

sum

119894=1

(120583119889119894

minus 120590119889119894

) ⊙ 120575119905+119889

(119894) (55)

119889up (119904lowast

119905+119889) =

119873

sum

119894=1

(120583119889119894

+ 120590119889119894

) ⊙ 120575119905+119889

(119894) (56)

This procedure is repeated until the failure state isencountered in the prediction of the next state The calcula-tion of the RUL is then simply obtained by summing all theestimated remaining time in each intermediate state beforeencountering the failure state

119863avg = sum119889avg (57)

119863low = sum119889low (58)

119863up = sum119889up (59)

Finally Algorithm 1 details the above described RUL esti-mation procedure

5 Experimental Results

To demonstrate the effectiveness of the proposed HSMMmodels we make use of a series of experiments performedboth on simulated and real data

The simulated data were generated by considering a left-right HSMM and adapting the parameters of the artificialexample reported in the work of Lee et al [15] The real casedata are monitoring data related to the entire operationallife of bearings made available for the IEEE PHM 2012 datachallenge (httpwwwfemto-stfrenResearch-departmentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-chal-lengephp)

10 Mathematical Problems in Engineering

(1) function RulEstimation(x119905 119878

119896) ⊳ x

119905 The last observation acquired

(2) ⊳ 119878119896 The failure state

(3) Initialization(4) 119863avg larr 0

(5) 119863low larr 0

(6) 119863up larr 0

(7) Current state estimation(8) Calculate 120575

119905⊳ Using (48)

(9) Calculate 119904lowast119905

⊳ Using (34)(10) Calculate d

119905⊳ Using (20)

(11) 119878 larr 119904lowast

119905

(12) Loop(13) while 119878 = 119878

119896do

(14) Calculate 119889avg ⊳ Using (49) or (54)(15) Calculate 119889low ⊳ Using (50) or (55)(16) Calculate 119889up ⊳ Using (51) or (56)(17) 119863avg larr 119863avg + 119889avg

(18) 119863low larr 119863low + 119889low(19) 119863up larr 119863up + 119889up(20) Calculate 120575next ⊳ Using (52)(21) Calculate 119904lowastnext ⊳ Using (53)(22) 119878 larr 119904

lowast

nextend

(23) return 119863avg119863low119863up

Algorithm 1 Remaining Useful Lifetime estimation (Pseudo-Code)

51 Simulated Experiment Data have been generated withthe idea of simulating the behavior of an industrial machinethat during its functioning experiences several degradationmodalities until a failure state is reached at the end of itslifetimeThe generated data are used to test the performancesof our methodology for (i) automatic model selection (ii)online condition monitoring and (iii) Remaining UsefulLifetime estimation considering both continuous and dis-crete observations

511 Data Generation The industrial machine subject ofthese experiments has been modeled as a left-right paramet-ric HSMM with 119873 = 5 states having state 119878

5as absorbing

(failure) stateThe choice of a left-right setting has beenmadefor simplicity reasons since the primary goal of this work is todemonstrate that the proposed model specification coupledwith the Akaike Information Criterion is effective to solveautomatic model selection online condition monitoringand prediction problems At this purpose we divided theexperiments in two cases according to the nature of theobservation being continuous or discrete

For each of the continuous and the discrete cases threedata sets have been generated by considering the followingduration models Gaussian Gamma and Weibull densitiesFor each of the three data sets 30 series of data are usedas training set and 10 series as testing set Each time seriescontains 119879 = 650 observations The parameters used togenerate the data are taken from the work of Lee et al [15]

and are adapted to obtain an equivalent left-right parametricHSMM as follows

120587 =

[[[[[

[

1

0

0

0

0

]]]]]

]

A0=

[[[[[

[

0 1 0 0 0

0 0 1 0 0

0 0 0 1 0

0 0 0 0 1

0 0 0 0 1

]]]]]

]

ΘN = 1205791= [100 20] 120579

2= [90 15] 120579

3= [100 20]

1205794= [80 25] 120579

5= [200 1]

ΘG = 1205791= [500 02] 120579

2= [540 01667]

1205793= [500 02] 120579

4= [256 03125]

1205795= [800 0005]

ΘW = 1205791= [102 28] 120579

2= [92 29] 120579

3= [102 28]

1205794= [82 20] 120579

5= [200 256]

(60)

whereΘNΘG andΘW are the different distribution param-eters used for the Gaussian Gamma and Weibull dura-tion models respectively More precisely they represent thevalues of the mean 120583

119889and the the variance 120590

2

119889of the

Gaussian distribution the shape ]119889and the scale 120578

119889of the

Gamma distribution and the scale 119886119889and the shape 119887

119889of

Mathematical Problems in Engineering 11

0 100 200 300 400 500 600

12345

Hidden states sequence

0 100 200 300 400 500 6000

100

200

300 State duration

0 100 200 300 400 500 6000

20

40

60Observed signal

(a) Example of simulated data for the continuous case

0 100 200 300 400 500 600

12345

Hidden states sequence

0 100 200 300 400 500 6000

100

200

300 State duration

0 100 200 300 400 500 600

132

54

76

Observed symbols

(b) Example of simulated data for the discrete case

Figure 2 The data generated with the parameter described in Section 511 both for the continuous and the discrete case

theWeibull distribution It must be noticed that as explainedin Section 21 for state 119878

5 being the absorbing state the

duration parameters 1205795have no influence on the data since

once the state 1198785is reached the system will remain there

foreverConcerning the continuous observation modeling a

bivariate Gaussian distribution has been used with the fol-lowing parameters [15]

1205831= [

20

20] 120583

2= [

20

35] 120583

3= [

35

35]

1205835= [

28

28]

1198801= [

20 0

0 20] 119880

2= [

15 0

0 15] 119880

3= [

15 minus2

minus2 15]

1198804= [

5 0

0 5] 119880

5= [

10 3

3 10]

(61)

while for the discrete case 119871 = 7 distinct observation sym-bols have been taken into consideration with the followingobservation probability distribution

119861 =

[[[[[

[

08 02 0 0 0 0 0

01 08 01 0 0 0 0

0 01 08 01 0 0 0

0 0 01 07 01 01 0

0 0 0 02 06 01 01

]]]]]

]

(62)

An example of simulated data both for the continuous and thediscrete cases is shown in Figure 2 where aGaussian durationmodel has been used

512 Training and Model Selection The goal of this experi-mental phase is to test the effectiveness of the AIC in solvingthe automatic model selection For this purpose the trainingsets of the 6 data sets (continuousdiscrte observation withGaussian Gamma and Weibull duration models) have beentaken individually and for each one of them a series of

learning procedure has been run each one with a variableHSMM structure In particular we took into account all thecombinations of the duration distribution families (GaussianGamma and Weibull) an increasing number of states 119873from 2 to 8 and for the continuous observation cases anincreasing number of Gaussian mixtures119872 in the observa-tion distribution from 1 to 4

As accurate parameter initialization is crucial for obtain-ing a good model fitting [14] a series of 40 learning proce-dures corresponding to 40 random initializations of the initialparameters 1205820 have been executed for each of the consideredHSMM structures For each model structure the AIC valueas defined in (46) has been evaluatedThe final trained set ofparameters 120582lowast corresponding to the minimumAIC value hasbeen retained resulting in 7 HSMMs with a number of statesfrom 2 to 8

The obtained results are shown in Figure 3 for boththe continuous and discrete observation data As it can benoticed for all the 6 test cases of Figure 3 the AIC valuesdo not improve much for a number of states higher than 5meaning that adding more states does not add considerableinformation to the HSMM modeling power Hence 5 statescan be considered as an optimal number of states Moreoveras shown in the zoomed sections of Figure 3 for the HSMMswith 5 states the minimum AIC values are obtained for theduration distributions corresponding to the ones used togenerate the data As a consequence AIC can be considered asan effective approach to performmodel selection for HSMMas well as selecting the appropriate parametric distributionfamily for the state duration modeling

513 Condition Monitoring The optimal parameters 120582lowast

obtained in the previous phase have been used to performthe online conditionmonitoring experiment on the 10 testingcases for all the 6 considered HSMM configurations In thisexperiment we simulate online monitoring by consideringall the testing observations up to the current time thatis x

1x2sdot sdot sdot x

119905 Each time a new data point is acquired

the Viterbi algorithm is used to estimate the current state119904lowast

119905= argmax

1le119894le119873[120575

119905(119894)] as specified in (34)

12 Mathematical Problems in Engineering

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(a) AIC values for continuous data and Gaussian durationdistribution

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(b) AIC values for continuous data and Gamma duration distri-bution

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(c) AIC values for continuous data and Weibull duration distri-bution

2 3 4 5 6 7 8Number of states

0

02

04

06

08

1

12

14

(d) AIC values for discrete data and Gaussian duration distribu-tion

2 3 4 5 6 7 8Number of states

0

02

04

06

08

1

12

14

GaussianGamma

Weibull

(e) AIC values for discrete data andGammaduration distribution

2 3 4 5 6 7 8Number of states

GaussianGamma

Weibull

0

02

04

06

08

1

12

14

(f) AIC values for discrete data andWeibull duration distribution

Figure 3 Akaike Information Criterion (AIC) applied to continuous and discrete observations data AIC is effective for automatic modelselection since its minimum value provides the same number of states and duration model used to generate the data

An example of execution of the condition monitoringexperiment is shown in Figure 4 for both continuous anddiscrete observations respectively In Figure 4(a) the stateduration has beenmodeledwith aGammadistribution whilein Figure 4(b) with a Gaussian distribution In Figures 4(a)and 4(b) the first display represents the true state of the

HSMM and the second display represents the estimated statefrom the Viterbi algorithm while the third display representsthe observed time series

Knowing the true state sequence we calculated theaccuracy defined as the percentage of correctly estimatedstates over the total length of the state sequence for each of

Mathematical Problems in Engineering 13

0 100 200 300 400 500 600 70012345

True state sequence

0 100 200 300 400 500 600 70012345

Viterbi path accuracy 0985

0 100 200 300 400 500 600 7000

204060 Observations

Correct guessWrong guess

(a) State estimation with Viterbi path for continuous data andGamma duration distribution

0 100 200 300 400 500 600 70012345

True state sequence

0 100 200 300 400 500 600 70012345

Viterbi path accuracy 0992

Correct guessWrong guess

0 100 200 300 400 500 600 700246

Observations

(b) State estimation with Viterbi path for discrete data andGaussian duration distribution

Figure 4 Condition monitoring using the Viterbi path HSMMs can be effective to solve condition monitoring problems in time-dependentapplications due to their high accuracy in hidden state recognition

the testing casesThe results are summarized in Table 1(a) forthe continuous observations and in Table 1(b) for the discreteobservations The high percentage of correct classified statesshows that HSMMs can be effectively used to solve conditionmonitoring problems for applications in which the systemshows a strong time and state duration dependency

514 Remaining Useful Lifetime Estimation In this experi-mental phasewe considered the state 119878

5as the failure state and

the trained parameters 120582lowast of Section 512 for each HSMMconfiguration As the online RUL estimation procedure isintended to be used in real time we simulated conditionmonitoring experiment where we progressively consider theobservations x

1x2sdot sdot sdot x

119905 up to time 119905 When a new obser-

vation is acquired after the current state probability 120575119905(119894)

is estimated (Equation (48)) the calculation of the averageupper and lower RUL ((57) (58) and (59)) is performed

Examples of RUL estimation are illustrated in Figure 5In particular Figure 5(a) represents the case of continuousobservations and durationmodeled by aWeibull distributionwhile Figure 5(b) shows the case of discrete observationsand duration modeled by a Gamma distribution From thefigures one can notice that the average as well as the lowerand the upper bound estimations converges to the real RULas the failure time approaches Moreover as expected theuncertainty about the estimation decreases with the timesince predictions performed at an early stage are moreimprecise As a consequence the upper and the lower boundbecome more narrow as the failure state approaches and theestimation becomes more precise until it converges to theactual RUL value with the prediction error tending to zeroat the end of the evaluation

To quantitatively estimate the performance of our meth-odology for the RUL estimation we considered at each time119905 the absolute prediction error (APE) between the real RULand the predicted value defined as

APE (119905) = 1003816100381610038161003816RULreal (119905) minus RUL (119905)1003816100381610038161003816 (63)

where RULreal(119905) is the (known) value of the RUL at time119905 while RUL(119905) is RUL predicted by the model To evaluatethe overall performance of our methodology we consideredthe average absolute prediction error of the RUL estimationdefined as

APE =sum

119879

119905=1APE (119905)119879

(64)

where 119879 is the length of the testing signal APE being aprediction error average values of (64) close to zero corre-spond to good predictive performances

The results for each of the 10 testing cases and the differentHSMM configurations are reported in Tables 2(a) and 2(b)for the continuous and the discrete observation cases respec-tively As it can be noticed the online Remaining UsefulLifetime estimation and in general the online prediction ofthe time to a certain event can be effectively faced withHSMMs which achieve a reliable estimation power with asmall prediction error

Finally we tested our RUL estimationmethodology usingthe state duration estimation of (16) introduced byAzimi et al[30ndash32] The results are shown in Tables 3(a) and 3(b)in which respectively the prediction errors obtained forcontinuous and discrete observations are reported

Comparing Table 2 and Table 3 one can notice that theproposed RUL method outperforms the one of Azimi This

14 Mathematical Problems in Engineering

0 50 100 150 200 250 300 3500

50

100

150

200

250

300

350

400

Time

True RULUpper RUL

Average RULLower RUL

(a) Remaining Useful Lifetime estimation for continuous data andWeibull duration distribution

0 50 100 150 200 250 300 3500

50

100

150

200

250

300

350

400

Time

True RULUpper RUL

Average RULLower RUL

(b) Remaining Useful Lifetime estimation for discrete data andGamma duration distribution

Figure 5 HSMMs effectively solve RUL estimation problemsThe prediction converges to the actual RUL value and its uncertainty decreasesas the real failure time approaches

Table 1 State recognition accuracy

(a) Continuous observations

Test case Duration distributionGaussian Gamma Weibull

1 994 985 9922 997 986 9953 994 992 9974 989 989 9975 982 989 1006 991 988 9977 985 994 9978 992 991 9959 992 986 99710 992 991 995Average 991 989 996

(b) Discrete observations

Test case Duration distributionGaussian Gamma Weibull

1 974 967 9742 972 976 9653 994 958 9664 982 953 9775 991 974 9756 978 977 9787 958 972 9668 977 964 9729 989 972 98510 992 956 969Average 981 967 973

is mainly due to the proposed average state duration of (20)compared to the one of Azimi given by (16)

52 Real Data In this sectionwe apply the proposedHSMM-based approach for RUL estimation to a real case studyusing bearing monitoring data recorded during experimentscarried out on the Pronostia experimental platform andmadeavailable for the IEEE Prognostics and Health Management(PHM) 2012 challenge [49]The data correspond to normallydegraded bearings leading to cases which closely correspondto the industrial reality

The choice of testing the proposed methodology onbearings derives from two facts (i) bearings are the mostcritical components related to failures of rotating machines[50] and (ii) their monotonically increasing degradationpattern justifies the usage of left-right HSMMmodels

521 Data Description Pronostia is an experimental plat-form designed and realized at the Automatic Controland Micro-Mechatronic Systems (AS2M) Department ofFranche-Comte Electronics Mechanics Thermal Process-ing Optics-Science and Technology (FEMTO-ST) Institute(httpwwwfemto-stfr) (Besancon France) with the aimof collecting real data related to accelerated degradation ofbearings Such data are used to validate methods for bearingcondition assessment diagnostic and prognostic [19 51ndash59]

The Pronostia platform allows to perform run-to-failuretests under constant or variable operating conditions Theoperating conditions are determined by two factors thatcan be controlled online (i) the rotation speed and (ii)the radial force load During each experiment tempera-ture and vibration monitoring measurements are gatheredonline through two type of sensors placed in the bearing

Mathematical Problems in Engineering 15

Table 2 Average absolute prediction error (APE) of the RUL esti-mation using the proposed state duration estimator of (20)

(a) APE of the RUL estimation for the continuous observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 51 170 67 140 290 091 45 170 812 76 190 50 61 210 85 66 190 613 77 54 190 29 120 170 160 290 304 90 210 29 75 220 68 60 190 675 73 190 47 22 140 140 39 170 876 65 180 56 51 180 100 140 270 277 47 160 75 48 170 110 12 130 1208 100 220 29 52 180 100 92 220 399 31 92 140 20 160 130 82 210 4910 64 180 56 75 220 69 33 120 130Average 68 170 74 57 190 99 73 200 70

(b) APE of the RUL estimation for the discrete observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 21 110 140 31 88 140 24 120 1302 21 110 130 110 220 33 190 320 713 51 170 76 66 180 51 23 140 1104 59 65 180 52 170 67 42 160 905 32 140 100 83 190 34 120 240 296 120 240 27 62 180 52 41 84 1607 29 150 97 93 210 23 190 310 668 150 270 70 74 180 43 43 170 949 59 180 77 110 230 55 39 160 8810 35 110 140 55 60 160 52 170 71Average 57 150 100 74 170 66 77 190 90

housing a temperature probe and two accelerometers (oneon the vertical and one on the horizontal axis)

The platform is composed of three main parts a rotatingpart a load profile generation part and a measurement partas illustrated in Figure 6

The rotating part is composed of an asynchronous motorwhich develops a power equal to 250W two shafts and agearbox which allows the motor to reach its rated speed of2830 rpm The motorrsquos rotation speed and the direction areset through a human machine interface

The load profiles part issues a radial force on the externalring of the test bearing through a pneumatic jack connectedto a lever arm which indirectly transmits the load througha clamping ring The goal of the applied radial force is toaccelerate the bearingrsquos degradation process

Table 3 Average absolute prediction error (APE) of the RUL esti-mation using the state duration estimator of (16) introduced byAzimi et al [30ndash32]

(a) APE of the RUL estimation for the continuous observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 578 510 668 262 97 527 259 284 6462 502 444 577 213 170 469 290 192 7083 503 447 573 271 87 565 345 139 7344 518 460 604 213 143 459 349 171 7875 594 537 662 290 95 554 334 156 7496 580 517 671 258 83 541 231 258 6657 594 536 669 182 125 477 360 171 7448 634 556 723 194 157 441 348 178 7709 491 435 570 145 171 432 251 267 67010 544 484 628 232 79 527 241 245 674Average 554 493 635 226 121 499 301 206 715

(b) APE of the RUL estimation for the discrete observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 514 410 624 424 318 530 326 264 7362 496 399 604 595 483 708 313 276 6933 502 386 623 465 357 574 324 257 7024 422 315 538 501 405 606 237 361 6035 443 339 558 478 374 591 360 256 7656 522 432 627 552 443 669 272 316 6437 550 439 668 560 457 670 347 232 7448 503 390 620 604 505 710 351 264 7249 555 474 640 480 372 595 318 222 73610 490 382 607 521 412 631 294 289 687Average 500 397 611 518 413 629 314 274 704

Themeasurement part consists of a data acquisition cardconnected to the monitoring sensors which provides theuser with the measured temperature and vibration data Thevibration measurements are provided in snapshots of 01 scollected each 10 seconds at a sampling frequency of 256 kHz(2560 samples per each snapshot) while the temperature hasbeen continuously recorded at a sampling frequency of 10Hz(600 samples collected each minute)

Further details on the Pronostia test rig can be found onthe data presentation paper [49] and on the web page of thedata challenge (httpwwwfemto-stfrenResearch-depart-mentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-challengephp)

16 Mathematical Problems in Engineering

Table 4 Lifetime duration (in seconds) and operating conditions of the bearings tested in the IEEE PHM 2012 Prognostic Challenge dataset[49]

Condition 1 Condition 2 Condition 31800 rpm and 4000N 1650 rpm and 4200N 1500 rpm and 5000N

Bearing Lifetime [s] Bearing Lifetime [s] Bearing Lifetime [s]Bearing1 1 28030 Bearing2 1 9110 Bearing3 1 5150Bearing1 2 8710 Bearing2 2 7970 Bearing3 2 16370Bearing1 3 23750 Bearing2 3 19550 Bearing3 3 4340Bearing1 4 14280 Bearing2 4 7510Bearing1 5 24630 Bearing2 5 23110Bearing1 6 24480 Bearing2 6 7010Bearing1 7 22590 Bearing2 7 2300

Loadmodule

Testedbearing

Dataacquisitionmodule

Rotatingmodule

Figure 6 Global overview of the Pronostia experimental platform[19]

Regarding the data provided for the PHM 2012 challenge3 different operating conditions were considered

(i) first operating conditions speed of 1800 rpm and loadof 4000 Newton

(ii) second operating conditions speed of 1650 rpm andload of 4200 Newton

(iii) third operating conditions speed of 1500 rpm andload of 5000 Newton

Under the above operating conditions a total of 17 accel-erated life tests were realized on bearings of type NSK 6804DD which can operate at a maximum speed of 13000 rpmand a load limit of 4000N The tests were stopped whenthe amplitude of the vibration signal was higher than 20 gthus this moment was defined as the bearing failure time Anexample of bearing before and after the experiment is shownin Figure 7 together with the corresponding vibration signalcollected during the whole test

Table 4 reports how the 17 tested bearings were separatedinto the three operating conditions Moreover the durationof each experiment being the RUL to be predicted for eachbearing is also given We performed two sets of experimentsby considering respectively the bearings relative to thefirst and the second operating condition (ie Bearing1 1Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

As already mentioned the available data correspondto normally degraded bearings meaning that the defects

were not initially induced and that each degraded bearingcontains almost all the types of defects (balls rings and cage)resembling faithfully a common real industrial situationMoreover no assumption about the type of failure to beoccurred is provided with the data and since the variabilityin experiment durations is high (from 1 h to 7 h) performinggood estimates of the RUL is a difficult task [49]

In our experiments we considered as input to our modelthe horizontal channel of the accelerometerWe preprocessedthe raw signals by extracting two time-domain features thatis root mean square (RMS) and kurtosis within windowsof the same length as the given snapshots (119871 = 2560) Let119903119908(119905) be the raw signal of the 119908th window for each 119908 we

estimate RMS as 119909RMS119908

= radic(1119871)sum119871

119905=11199032119908(119905) and kurtosis as

119909KURT119908

= ((1119871)sum119871

119905=1(119903

119908(119905) minus 119903

119908)4)((1119871)sum

119871

119905=1(119903

119908(119905) minus 119903

119908)2)2

where 119903119908is the mean of 119903

119908 An example of feature extraction

for Bearing1 1 is shown in Figure 8To assess the performance of the proposed HSMM after

the model selection procedure we implemented a leave-one-out cross validation scheme by considering separatelyconditions 1 and 2 we performed the online RUL estimationfor each of the 7 bearings using an HSMM trained with theremaining 6 bearing histories Similarly to the simulated casewe considered the average absolute prediction error definedin (64) to quantitatively evaluate our method

522 Bearings RUL Estimation We performed our exper-iments in two steps firstly we applied model selection inorder to determine an optimalmodel structure and secondlywe estimated the RUL of the bearings The full procedure isdetailed in the following

(A) HSMM Structure To determine an appropriate HSMMstructure for effectively modeling the considered data weconsidered several HSMM structures characterized by (i)the duration distribution family (being Gaussian Gamma orWeibull) (ii) an increasing number of states119873 from 2 to 6and (iii) an increasing number of Gaussian mixtures 119872 inthe observation density from 1 to 4 For eachmodel structureobtained by systematically considering all the combinationsof (i) to (iii) we run 120 parameter learnings correspondingto 120 random initializations 1205820 on the data sets (Bearing1 1

Mathematical Problems in Engineering 17

times106

0 1 2 3 4 5 6 7

50

40

30

20

10

0

minus10

minus20

minus30

minus40

minus50

Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]

(b)(a)

0 280300

7

Extracted feature RMS

Time (s)

0 280300

60

Time (s)

Extracted feature kurtosis

Featureextraction

Win

dow

1

Win

dow

2

Win

dow

3

Win

dow

Win

dow

r(t)

x1 x2 x3

middot middot middot

middot middot middot

nminus1

n

xnminus1 xn

Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1

Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density

(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration

Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast

119894were estimated for the 119894th testing bear-

ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878

4as the failure state The same proce-

dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-

ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as

18 Mathematical Problems in Engineering

2 3 4 5 60

005

01

015

02

025

03

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(a) AIC values for Condition 1

2 3 4 5 60

01

02

03

04

05

06

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(b) AIC values for Condition 2

Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density

0 22590

22590

45000

Time (s)

True RULUpper RUL

Average RULLower RUL

(a) RUL estimation for Bearing1 7

True RULUpper RUL

Average RULLower RUL

0 70100

7010

19000

Time (s)

(b) RUL estimation for Bearing2 6

Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings

well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time

Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors

of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2

6 Conclusion and Future Work

In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation

The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required

Mathematical Problems in Engineering 19

Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds

(a) Condition 1

Test Bearings APEavg APElow APEup

Bearing1 1 105716 127230 94146Bearing1 2 43312 38156 38213Bearing1 3 29970 97309 60912Bearing1 4 63363 28766 148719Bearing1 5 19689 74484 104115Bearing1 6 42530 98964 97937Bearing1 7 13880 74943 100881Average 45494 77122 92132

(b) Condition 2

Test Bearings APEavg APElow APEup

Bearing2 1 24759 50065 72875Bearing2 2 16473 44972 82886Bearing2 3 88771 95083 79621Bearing2 4 17698 42486 49825Bearing2 5 86631 104900 107300Bearing2 6 8771 35047 66870Bearing2 7 30125 38664 66519Average 39033 58745 75128

parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology

This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications

As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension

Appendix

In this appendix we give the derivation of the state durationvariable introduced in (20) as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(119883

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A1)

The random variable 119889119905(119894) has been defined in Section 21

as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889

119905(119894) is sampled from an

arbitrary distribution

119889119905(119894) sim 119891 (119889) (A2)

We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as

P (119889119905(119894) = 119889) = P (119904

119905minus119889minus1= 119878

119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905 120582)

(A3)

We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889

119905(119894) of 119889

119905(119894) defined as its

expected value (see Equation (15))

119889119905(119894) = E (119889

119905(119894) | 119904

119905= 119878

119894 x

1x2sdot sdot sdot x

119905) 1 le 119894 le 119873 (A4)

From the definition of expectation we have

119889119905(119894) =

119905

sum

119889=1

119889 sdot P (119889119905(119894) = 119889)

=

119905

sum

119889=1

119889 sdot P (119904119905minus119889minus1

= 119878119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

(A5)

For 119889119905+1

(119894) we have

119889119905+1

(119894) =

119905+1

sum

119889=1

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119886)

(A6)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

(A7)

By noticing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

= (P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1))

sdot (P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1))

minus1

(A8)

20 Mathematical Problems in Engineering

we can replace the probability of the second termof (A7)with

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119887)

(A9)

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

(A10)

In the last factor of (A10) we can omit the information aboutthe current state and observation by observing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

asymp P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894| 119904

119905= 119878

119894 x

1 x

119905)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119888)

(A11)

if the following independencies hold

119904119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

119883119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

(A12)

wherewithperpwe denote independency Equation (A12) holdsforHMMs (evenwithout conditioning on x

1 x

119905) but they

do not hold for HSMMs since the state duration (expressedby 119904

119905minus119889+1 119904

119905minus1) determines the system evolution On

the other hand state duration is partially known by theobservtions x

1 x

119905Thus the approximation is reasonable

as long as the uncertainty on the states is within limitsFrom (A6) (A9) and (A11) we obtain

119889119905+1

(119894) = (119886) +

119905+1

sum

119889=2

119889 sdot (119887) sdot (119888)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

P(119860119861|119862)=P(119860|119861119862)sdotP(119861|119862)

+

119905+1

sum

119889=2

119889 sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

it does not depend on 119889

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

+ P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [

[

P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894119904

119905+1= 119878

119894 x

1 x

119905x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

for the approximation of (A11)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

]

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) +

119905

sum

1198891015840=1

(1198891015840+ 1)

sdot P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

(A13)

Noticing that

119905

sum

1198891015840=1

P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

+ P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) = 1

(A14)

because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows

119889119905+1

(119894) = P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1) sdot (119889

119905(119894) + 1)

(A15)

The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step

In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889

119905+1(119894)

we can consider the following equality

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=P (119904

119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

P (119904119905+1

= 119878119894| x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

120574119905+1

(119894)

(A16)

If we consider the terms involved in the probability at thenumerator of the right-hand side of (A16) we have that

x1 x

119905⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119861

perp x119905+1⏟⏟⏟⏟⏟⏟⏟

119862

| 119904119905= 119878

119894 119904

119905+1= 119878

119894⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119860

(A17)

Mathematical Problems in Engineering 21

If 119861 perp 119862 | 119860 for the Bayes rule we have that

P (119860 | 119862 119861) =P (119862 | 119860119861) sdot P (119860 | 119861)

P (119862 | 119861) (A18)

Hence we can rewrite the numerator of the right-hand sideof (A16) as follows

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

= (P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905)

sdot P(x119905+1

|

x119905+1

perp119904119905|119904119905+1

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞119904119905= 119878

119894 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

= (P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

sdot

120574119905(119894)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (119904

119905= 119878

119894| x

1 x

119905) sdot

119887119894(x119905+1

)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (x

119905+1| 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

(A19)

The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as

P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

= sum

119889119905

119886119894119894(d

119905) sdot P (119889

119905| x

1 x

119905)

asymp 119886119894119894(d

119905)

(A20)

while the denominator of (A19) can be expressed as follows

P (x119905+1

| x1 x

119905) =

P (x1 x

119905 x

119905+1)

P (x1 x

119905)

=sum

119873

119894=1120572119905+1

(119894)

sum119873

119894=1120572119905(119894)

(A21)

By substituting (A20) and (A21) in (A19) we obtain

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

sum119873

119894=1120572119905+1

(119894)

(A22)

and then by combining (A22) and (A16) we obtain

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

120574119905+1

(119894) sum119873

119894=1120572119905+1

(119894)

(A23)

Finally by substituting (A23) in (A15) and considering that

120574119905(119894) =

120572119905(119894)

sum119873

119894=1120572119905(119894)

(A24)

we derive the induction formula for 119889119905+1

(119894) in terms of modelparameters as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(x

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A25)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989

[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004

[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996

[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002

[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005

[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008

[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309

[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en

[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003

[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002

[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven

22 Mathematical Problems in Engineering

methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012

[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008

[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989

[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031

[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010

[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010

[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011

[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012

[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012

[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980

[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk

[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007

[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005

[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003

[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997

[26] S-Z Yu ldquoHidden semi-Markov modelsrdquo Artificial Intelligencevol 174 no 2 pp 215ndash243 2010

[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003

[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006

[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014

[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003

[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008

[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005

[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999

[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007

[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011

[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001

[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002

[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007

[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997

[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008

[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005

[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997

Mathematical Problems in Engineering 23

[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002

[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986

[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993

[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006

[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973

[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977

[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012

[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985

[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015

[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013

[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013

[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013

[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013

[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012

[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012

[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model

and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013

[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 10: Hidden Semi-Markov Models for Predictive Maintenance

10 Mathematical Problems in Engineering

(1) function RulEstimation(x119905 119878

119896) ⊳ x

119905 The last observation acquired

(2) ⊳ 119878119896 The failure state

(3) Initialization(4) 119863avg larr 0

(5) 119863low larr 0

(6) 119863up larr 0

(7) Current state estimation(8) Calculate 120575

119905⊳ Using (48)

(9) Calculate 119904lowast119905

⊳ Using (34)(10) Calculate d

119905⊳ Using (20)

(11) 119878 larr 119904lowast

119905

(12) Loop(13) while 119878 = 119878

119896do

(14) Calculate 119889avg ⊳ Using (49) or (54)(15) Calculate 119889low ⊳ Using (50) or (55)(16) Calculate 119889up ⊳ Using (51) or (56)(17) 119863avg larr 119863avg + 119889avg

(18) 119863low larr 119863low + 119889low(19) 119863up larr 119863up + 119889up(20) Calculate 120575next ⊳ Using (52)(21) Calculate 119904lowastnext ⊳ Using (53)(22) 119878 larr 119904

lowast

nextend

(23) return 119863avg119863low119863up

Algorithm 1 Remaining Useful Lifetime estimation (Pseudo-Code)

51 Simulated Experiment Data have been generated withthe idea of simulating the behavior of an industrial machinethat during its functioning experiences several degradationmodalities until a failure state is reached at the end of itslifetimeThe generated data are used to test the performancesof our methodology for (i) automatic model selection (ii)online condition monitoring and (iii) Remaining UsefulLifetime estimation considering both continuous and dis-crete observations

511 Data Generation The industrial machine subject ofthese experiments has been modeled as a left-right paramet-ric HSMM with 119873 = 5 states having state 119878

5as absorbing

(failure) stateThe choice of a left-right setting has beenmadefor simplicity reasons since the primary goal of this work is todemonstrate that the proposed model specification coupledwith the Akaike Information Criterion is effective to solveautomatic model selection online condition monitoringand prediction problems At this purpose we divided theexperiments in two cases according to the nature of theobservation being continuous or discrete

For each of the continuous and the discrete cases threedata sets have been generated by considering the followingduration models Gaussian Gamma and Weibull densitiesFor each of the three data sets 30 series of data are usedas training set and 10 series as testing set Each time seriescontains 119879 = 650 observations The parameters used togenerate the data are taken from the work of Lee et al [15]

and are adapted to obtain an equivalent left-right parametricHSMM as follows

120587 =

[[[[[

[

1

0

0

0

0

]]]]]

]

A0=

[[[[[

[

0 1 0 0 0

0 0 1 0 0

0 0 0 1 0

0 0 0 0 1

0 0 0 0 1

]]]]]

]

ΘN = 1205791= [100 20] 120579

2= [90 15] 120579

3= [100 20]

1205794= [80 25] 120579

5= [200 1]

ΘG = 1205791= [500 02] 120579

2= [540 01667]

1205793= [500 02] 120579

4= [256 03125]

1205795= [800 0005]

ΘW = 1205791= [102 28] 120579

2= [92 29] 120579

3= [102 28]

1205794= [82 20] 120579

5= [200 256]

(60)

whereΘNΘG andΘW are the different distribution param-eters used for the Gaussian Gamma and Weibull dura-tion models respectively More precisely they represent thevalues of the mean 120583

119889and the the variance 120590

2

119889of the

Gaussian distribution the shape ]119889and the scale 120578

119889of the

Gamma distribution and the scale 119886119889and the shape 119887

119889of

Mathematical Problems in Engineering 11

0 100 200 300 400 500 600

12345

Hidden states sequence

0 100 200 300 400 500 6000

100

200

300 State duration

0 100 200 300 400 500 6000

20

40

60Observed signal

(a) Example of simulated data for the continuous case

0 100 200 300 400 500 600

12345

Hidden states sequence

0 100 200 300 400 500 6000

100

200

300 State duration

0 100 200 300 400 500 600

132

54

76

Observed symbols

(b) Example of simulated data for the discrete case

Figure 2 The data generated with the parameter described in Section 511 both for the continuous and the discrete case

theWeibull distribution It must be noticed that as explainedin Section 21 for state 119878

5 being the absorbing state the

duration parameters 1205795have no influence on the data since

once the state 1198785is reached the system will remain there

foreverConcerning the continuous observation modeling a

bivariate Gaussian distribution has been used with the fol-lowing parameters [15]

1205831= [

20

20] 120583

2= [

20

35] 120583

3= [

35

35]

1205835= [

28

28]

1198801= [

20 0

0 20] 119880

2= [

15 0

0 15] 119880

3= [

15 minus2

minus2 15]

1198804= [

5 0

0 5] 119880

5= [

10 3

3 10]

(61)

while for the discrete case 119871 = 7 distinct observation sym-bols have been taken into consideration with the followingobservation probability distribution

119861 =

[[[[[

[

08 02 0 0 0 0 0

01 08 01 0 0 0 0

0 01 08 01 0 0 0

0 0 01 07 01 01 0

0 0 0 02 06 01 01

]]]]]

]

(62)

An example of simulated data both for the continuous and thediscrete cases is shown in Figure 2 where aGaussian durationmodel has been used

512 Training and Model Selection The goal of this experi-mental phase is to test the effectiveness of the AIC in solvingthe automatic model selection For this purpose the trainingsets of the 6 data sets (continuousdiscrte observation withGaussian Gamma and Weibull duration models) have beentaken individually and for each one of them a series of

learning procedure has been run each one with a variableHSMM structure In particular we took into account all thecombinations of the duration distribution families (GaussianGamma and Weibull) an increasing number of states 119873from 2 to 8 and for the continuous observation cases anincreasing number of Gaussian mixtures119872 in the observa-tion distribution from 1 to 4

As accurate parameter initialization is crucial for obtain-ing a good model fitting [14] a series of 40 learning proce-dures corresponding to 40 random initializations of the initialparameters 1205820 have been executed for each of the consideredHSMM structures For each model structure the AIC valueas defined in (46) has been evaluatedThe final trained set ofparameters 120582lowast corresponding to the minimumAIC value hasbeen retained resulting in 7 HSMMs with a number of statesfrom 2 to 8

The obtained results are shown in Figure 3 for boththe continuous and discrete observation data As it can benoticed for all the 6 test cases of Figure 3 the AIC valuesdo not improve much for a number of states higher than 5meaning that adding more states does not add considerableinformation to the HSMM modeling power Hence 5 statescan be considered as an optimal number of states Moreoveras shown in the zoomed sections of Figure 3 for the HSMMswith 5 states the minimum AIC values are obtained for theduration distributions corresponding to the ones used togenerate the data As a consequence AIC can be considered asan effective approach to performmodel selection for HSMMas well as selecting the appropriate parametric distributionfamily for the state duration modeling

513 Condition Monitoring The optimal parameters 120582lowast

obtained in the previous phase have been used to performthe online conditionmonitoring experiment on the 10 testingcases for all the 6 considered HSMM configurations In thisexperiment we simulate online monitoring by consideringall the testing observations up to the current time thatis x

1x2sdot sdot sdot x

119905 Each time a new data point is acquired

the Viterbi algorithm is used to estimate the current state119904lowast

119905= argmax

1le119894le119873[120575

119905(119894)] as specified in (34)

12 Mathematical Problems in Engineering

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(a) AIC values for continuous data and Gaussian durationdistribution

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(b) AIC values for continuous data and Gamma duration distri-bution

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(c) AIC values for continuous data and Weibull duration distri-bution

2 3 4 5 6 7 8Number of states

0

02

04

06

08

1

12

14

(d) AIC values for discrete data and Gaussian duration distribu-tion

2 3 4 5 6 7 8Number of states

0

02

04

06

08

1

12

14

GaussianGamma

Weibull

(e) AIC values for discrete data andGammaduration distribution

2 3 4 5 6 7 8Number of states

GaussianGamma

Weibull

0

02

04

06

08

1

12

14

(f) AIC values for discrete data andWeibull duration distribution

Figure 3 Akaike Information Criterion (AIC) applied to continuous and discrete observations data AIC is effective for automatic modelselection since its minimum value provides the same number of states and duration model used to generate the data

An example of execution of the condition monitoringexperiment is shown in Figure 4 for both continuous anddiscrete observations respectively In Figure 4(a) the stateduration has beenmodeledwith aGammadistribution whilein Figure 4(b) with a Gaussian distribution In Figures 4(a)and 4(b) the first display represents the true state of the

HSMM and the second display represents the estimated statefrom the Viterbi algorithm while the third display representsthe observed time series

Knowing the true state sequence we calculated theaccuracy defined as the percentage of correctly estimatedstates over the total length of the state sequence for each of

Mathematical Problems in Engineering 13

0 100 200 300 400 500 600 70012345

True state sequence

0 100 200 300 400 500 600 70012345

Viterbi path accuracy 0985

0 100 200 300 400 500 600 7000

204060 Observations

Correct guessWrong guess

(a) State estimation with Viterbi path for continuous data andGamma duration distribution

0 100 200 300 400 500 600 70012345

True state sequence

0 100 200 300 400 500 600 70012345

Viterbi path accuracy 0992

Correct guessWrong guess

0 100 200 300 400 500 600 700246

Observations

(b) State estimation with Viterbi path for discrete data andGaussian duration distribution

Figure 4 Condition monitoring using the Viterbi path HSMMs can be effective to solve condition monitoring problems in time-dependentapplications due to their high accuracy in hidden state recognition

the testing casesThe results are summarized in Table 1(a) forthe continuous observations and in Table 1(b) for the discreteobservations The high percentage of correct classified statesshows that HSMMs can be effectively used to solve conditionmonitoring problems for applications in which the systemshows a strong time and state duration dependency

514 Remaining Useful Lifetime Estimation In this experi-mental phasewe considered the state 119878

5as the failure state and

the trained parameters 120582lowast of Section 512 for each HSMMconfiguration As the online RUL estimation procedure isintended to be used in real time we simulated conditionmonitoring experiment where we progressively consider theobservations x

1x2sdot sdot sdot x

119905 up to time 119905 When a new obser-

vation is acquired after the current state probability 120575119905(119894)

is estimated (Equation (48)) the calculation of the averageupper and lower RUL ((57) (58) and (59)) is performed

Examples of RUL estimation are illustrated in Figure 5In particular Figure 5(a) represents the case of continuousobservations and durationmodeled by aWeibull distributionwhile Figure 5(b) shows the case of discrete observationsand duration modeled by a Gamma distribution From thefigures one can notice that the average as well as the lowerand the upper bound estimations converges to the real RULas the failure time approaches Moreover as expected theuncertainty about the estimation decreases with the timesince predictions performed at an early stage are moreimprecise As a consequence the upper and the lower boundbecome more narrow as the failure state approaches and theestimation becomes more precise until it converges to theactual RUL value with the prediction error tending to zeroat the end of the evaluation

To quantitatively estimate the performance of our meth-odology for the RUL estimation we considered at each time119905 the absolute prediction error (APE) between the real RULand the predicted value defined as

APE (119905) = 1003816100381610038161003816RULreal (119905) minus RUL (119905)1003816100381610038161003816 (63)

where RULreal(119905) is the (known) value of the RUL at time119905 while RUL(119905) is RUL predicted by the model To evaluatethe overall performance of our methodology we consideredthe average absolute prediction error of the RUL estimationdefined as

APE =sum

119879

119905=1APE (119905)119879

(64)

where 119879 is the length of the testing signal APE being aprediction error average values of (64) close to zero corre-spond to good predictive performances

The results for each of the 10 testing cases and the differentHSMM configurations are reported in Tables 2(a) and 2(b)for the continuous and the discrete observation cases respec-tively As it can be noticed the online Remaining UsefulLifetime estimation and in general the online prediction ofthe time to a certain event can be effectively faced withHSMMs which achieve a reliable estimation power with asmall prediction error

Finally we tested our RUL estimationmethodology usingthe state duration estimation of (16) introduced byAzimi et al[30ndash32] The results are shown in Tables 3(a) and 3(b)in which respectively the prediction errors obtained forcontinuous and discrete observations are reported

Comparing Table 2 and Table 3 one can notice that theproposed RUL method outperforms the one of Azimi This

14 Mathematical Problems in Engineering

0 50 100 150 200 250 300 3500

50

100

150

200

250

300

350

400

Time

True RULUpper RUL

Average RULLower RUL

(a) Remaining Useful Lifetime estimation for continuous data andWeibull duration distribution

0 50 100 150 200 250 300 3500

50

100

150

200

250

300

350

400

Time

True RULUpper RUL

Average RULLower RUL

(b) Remaining Useful Lifetime estimation for discrete data andGamma duration distribution

Figure 5 HSMMs effectively solve RUL estimation problemsThe prediction converges to the actual RUL value and its uncertainty decreasesas the real failure time approaches

Table 1 State recognition accuracy

(a) Continuous observations

Test case Duration distributionGaussian Gamma Weibull

1 994 985 9922 997 986 9953 994 992 9974 989 989 9975 982 989 1006 991 988 9977 985 994 9978 992 991 9959 992 986 99710 992 991 995Average 991 989 996

(b) Discrete observations

Test case Duration distributionGaussian Gamma Weibull

1 974 967 9742 972 976 9653 994 958 9664 982 953 9775 991 974 9756 978 977 9787 958 972 9668 977 964 9729 989 972 98510 992 956 969Average 981 967 973

is mainly due to the proposed average state duration of (20)compared to the one of Azimi given by (16)

52 Real Data In this sectionwe apply the proposedHSMM-based approach for RUL estimation to a real case studyusing bearing monitoring data recorded during experimentscarried out on the Pronostia experimental platform andmadeavailable for the IEEE Prognostics and Health Management(PHM) 2012 challenge [49]The data correspond to normallydegraded bearings leading to cases which closely correspondto the industrial reality

The choice of testing the proposed methodology onbearings derives from two facts (i) bearings are the mostcritical components related to failures of rotating machines[50] and (ii) their monotonically increasing degradationpattern justifies the usage of left-right HSMMmodels

521 Data Description Pronostia is an experimental plat-form designed and realized at the Automatic Controland Micro-Mechatronic Systems (AS2M) Department ofFranche-Comte Electronics Mechanics Thermal Process-ing Optics-Science and Technology (FEMTO-ST) Institute(httpwwwfemto-stfr) (Besancon France) with the aimof collecting real data related to accelerated degradation ofbearings Such data are used to validate methods for bearingcondition assessment diagnostic and prognostic [19 51ndash59]

The Pronostia platform allows to perform run-to-failuretests under constant or variable operating conditions Theoperating conditions are determined by two factors thatcan be controlled online (i) the rotation speed and (ii)the radial force load During each experiment tempera-ture and vibration monitoring measurements are gatheredonline through two type of sensors placed in the bearing

Mathematical Problems in Engineering 15

Table 2 Average absolute prediction error (APE) of the RUL esti-mation using the proposed state duration estimator of (20)

(a) APE of the RUL estimation for the continuous observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 51 170 67 140 290 091 45 170 812 76 190 50 61 210 85 66 190 613 77 54 190 29 120 170 160 290 304 90 210 29 75 220 68 60 190 675 73 190 47 22 140 140 39 170 876 65 180 56 51 180 100 140 270 277 47 160 75 48 170 110 12 130 1208 100 220 29 52 180 100 92 220 399 31 92 140 20 160 130 82 210 4910 64 180 56 75 220 69 33 120 130Average 68 170 74 57 190 99 73 200 70

(b) APE of the RUL estimation for the discrete observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 21 110 140 31 88 140 24 120 1302 21 110 130 110 220 33 190 320 713 51 170 76 66 180 51 23 140 1104 59 65 180 52 170 67 42 160 905 32 140 100 83 190 34 120 240 296 120 240 27 62 180 52 41 84 1607 29 150 97 93 210 23 190 310 668 150 270 70 74 180 43 43 170 949 59 180 77 110 230 55 39 160 8810 35 110 140 55 60 160 52 170 71Average 57 150 100 74 170 66 77 190 90

housing a temperature probe and two accelerometers (oneon the vertical and one on the horizontal axis)

The platform is composed of three main parts a rotatingpart a load profile generation part and a measurement partas illustrated in Figure 6

The rotating part is composed of an asynchronous motorwhich develops a power equal to 250W two shafts and agearbox which allows the motor to reach its rated speed of2830 rpm The motorrsquos rotation speed and the direction areset through a human machine interface

The load profiles part issues a radial force on the externalring of the test bearing through a pneumatic jack connectedto a lever arm which indirectly transmits the load througha clamping ring The goal of the applied radial force is toaccelerate the bearingrsquos degradation process

Table 3 Average absolute prediction error (APE) of the RUL esti-mation using the state duration estimator of (16) introduced byAzimi et al [30ndash32]

(a) APE of the RUL estimation for the continuous observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 578 510 668 262 97 527 259 284 6462 502 444 577 213 170 469 290 192 7083 503 447 573 271 87 565 345 139 7344 518 460 604 213 143 459 349 171 7875 594 537 662 290 95 554 334 156 7496 580 517 671 258 83 541 231 258 6657 594 536 669 182 125 477 360 171 7448 634 556 723 194 157 441 348 178 7709 491 435 570 145 171 432 251 267 67010 544 484 628 232 79 527 241 245 674Average 554 493 635 226 121 499 301 206 715

(b) APE of the RUL estimation for the discrete observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 514 410 624 424 318 530 326 264 7362 496 399 604 595 483 708 313 276 6933 502 386 623 465 357 574 324 257 7024 422 315 538 501 405 606 237 361 6035 443 339 558 478 374 591 360 256 7656 522 432 627 552 443 669 272 316 6437 550 439 668 560 457 670 347 232 7448 503 390 620 604 505 710 351 264 7249 555 474 640 480 372 595 318 222 73610 490 382 607 521 412 631 294 289 687Average 500 397 611 518 413 629 314 274 704

Themeasurement part consists of a data acquisition cardconnected to the monitoring sensors which provides theuser with the measured temperature and vibration data Thevibration measurements are provided in snapshots of 01 scollected each 10 seconds at a sampling frequency of 256 kHz(2560 samples per each snapshot) while the temperature hasbeen continuously recorded at a sampling frequency of 10Hz(600 samples collected each minute)

Further details on the Pronostia test rig can be found onthe data presentation paper [49] and on the web page of thedata challenge (httpwwwfemto-stfrenResearch-depart-mentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-challengephp)

16 Mathematical Problems in Engineering

Table 4 Lifetime duration (in seconds) and operating conditions of the bearings tested in the IEEE PHM 2012 Prognostic Challenge dataset[49]

Condition 1 Condition 2 Condition 31800 rpm and 4000N 1650 rpm and 4200N 1500 rpm and 5000N

Bearing Lifetime [s] Bearing Lifetime [s] Bearing Lifetime [s]Bearing1 1 28030 Bearing2 1 9110 Bearing3 1 5150Bearing1 2 8710 Bearing2 2 7970 Bearing3 2 16370Bearing1 3 23750 Bearing2 3 19550 Bearing3 3 4340Bearing1 4 14280 Bearing2 4 7510Bearing1 5 24630 Bearing2 5 23110Bearing1 6 24480 Bearing2 6 7010Bearing1 7 22590 Bearing2 7 2300

Loadmodule

Testedbearing

Dataacquisitionmodule

Rotatingmodule

Figure 6 Global overview of the Pronostia experimental platform[19]

Regarding the data provided for the PHM 2012 challenge3 different operating conditions were considered

(i) first operating conditions speed of 1800 rpm and loadof 4000 Newton

(ii) second operating conditions speed of 1650 rpm andload of 4200 Newton

(iii) third operating conditions speed of 1500 rpm andload of 5000 Newton

Under the above operating conditions a total of 17 accel-erated life tests were realized on bearings of type NSK 6804DD which can operate at a maximum speed of 13000 rpmand a load limit of 4000N The tests were stopped whenthe amplitude of the vibration signal was higher than 20 gthus this moment was defined as the bearing failure time Anexample of bearing before and after the experiment is shownin Figure 7 together with the corresponding vibration signalcollected during the whole test

Table 4 reports how the 17 tested bearings were separatedinto the three operating conditions Moreover the durationof each experiment being the RUL to be predicted for eachbearing is also given We performed two sets of experimentsby considering respectively the bearings relative to thefirst and the second operating condition (ie Bearing1 1Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

As already mentioned the available data correspondto normally degraded bearings meaning that the defects

were not initially induced and that each degraded bearingcontains almost all the types of defects (balls rings and cage)resembling faithfully a common real industrial situationMoreover no assumption about the type of failure to beoccurred is provided with the data and since the variabilityin experiment durations is high (from 1 h to 7 h) performinggood estimates of the RUL is a difficult task [49]

In our experiments we considered as input to our modelthe horizontal channel of the accelerometerWe preprocessedthe raw signals by extracting two time-domain features thatis root mean square (RMS) and kurtosis within windowsof the same length as the given snapshots (119871 = 2560) Let119903119908(119905) be the raw signal of the 119908th window for each 119908 we

estimate RMS as 119909RMS119908

= radic(1119871)sum119871

119905=11199032119908(119905) and kurtosis as

119909KURT119908

= ((1119871)sum119871

119905=1(119903

119908(119905) minus 119903

119908)4)((1119871)sum

119871

119905=1(119903

119908(119905) minus 119903

119908)2)2

where 119903119908is the mean of 119903

119908 An example of feature extraction

for Bearing1 1 is shown in Figure 8To assess the performance of the proposed HSMM after

the model selection procedure we implemented a leave-one-out cross validation scheme by considering separatelyconditions 1 and 2 we performed the online RUL estimationfor each of the 7 bearings using an HSMM trained with theremaining 6 bearing histories Similarly to the simulated casewe considered the average absolute prediction error definedin (64) to quantitatively evaluate our method

522 Bearings RUL Estimation We performed our exper-iments in two steps firstly we applied model selection inorder to determine an optimalmodel structure and secondlywe estimated the RUL of the bearings The full procedure isdetailed in the following

(A) HSMM Structure To determine an appropriate HSMMstructure for effectively modeling the considered data weconsidered several HSMM structures characterized by (i)the duration distribution family (being Gaussian Gamma orWeibull) (ii) an increasing number of states119873 from 2 to 6and (iii) an increasing number of Gaussian mixtures 119872 inthe observation density from 1 to 4 For eachmodel structureobtained by systematically considering all the combinationsof (i) to (iii) we run 120 parameter learnings correspondingto 120 random initializations 1205820 on the data sets (Bearing1 1

Mathematical Problems in Engineering 17

times106

0 1 2 3 4 5 6 7

50

40

30

20

10

0

minus10

minus20

minus30

minus40

minus50

Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]

(b)(a)

0 280300

7

Extracted feature RMS

Time (s)

0 280300

60

Time (s)

Extracted feature kurtosis

Featureextraction

Win

dow

1

Win

dow

2

Win

dow

3

Win

dow

Win

dow

r(t)

x1 x2 x3

middot middot middot

middot middot middot

nminus1

n

xnminus1 xn

Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1

Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density

(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration

Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast

119894were estimated for the 119894th testing bear-

ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878

4as the failure state The same proce-

dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-

ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as

18 Mathematical Problems in Engineering

2 3 4 5 60

005

01

015

02

025

03

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(a) AIC values for Condition 1

2 3 4 5 60

01

02

03

04

05

06

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(b) AIC values for Condition 2

Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density

0 22590

22590

45000

Time (s)

True RULUpper RUL

Average RULLower RUL

(a) RUL estimation for Bearing1 7

True RULUpper RUL

Average RULLower RUL

0 70100

7010

19000

Time (s)

(b) RUL estimation for Bearing2 6

Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings

well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time

Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors

of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2

6 Conclusion and Future Work

In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation

The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required

Mathematical Problems in Engineering 19

Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds

(a) Condition 1

Test Bearings APEavg APElow APEup

Bearing1 1 105716 127230 94146Bearing1 2 43312 38156 38213Bearing1 3 29970 97309 60912Bearing1 4 63363 28766 148719Bearing1 5 19689 74484 104115Bearing1 6 42530 98964 97937Bearing1 7 13880 74943 100881Average 45494 77122 92132

(b) Condition 2

Test Bearings APEavg APElow APEup

Bearing2 1 24759 50065 72875Bearing2 2 16473 44972 82886Bearing2 3 88771 95083 79621Bearing2 4 17698 42486 49825Bearing2 5 86631 104900 107300Bearing2 6 8771 35047 66870Bearing2 7 30125 38664 66519Average 39033 58745 75128

parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology

This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications

As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension

Appendix

In this appendix we give the derivation of the state durationvariable introduced in (20) as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(119883

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A1)

The random variable 119889119905(119894) has been defined in Section 21

as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889

119905(119894) is sampled from an

arbitrary distribution

119889119905(119894) sim 119891 (119889) (A2)

We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as

P (119889119905(119894) = 119889) = P (119904

119905minus119889minus1= 119878

119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905 120582)

(A3)

We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889

119905(119894) of 119889

119905(119894) defined as its

expected value (see Equation (15))

119889119905(119894) = E (119889

119905(119894) | 119904

119905= 119878

119894 x

1x2sdot sdot sdot x

119905) 1 le 119894 le 119873 (A4)

From the definition of expectation we have

119889119905(119894) =

119905

sum

119889=1

119889 sdot P (119889119905(119894) = 119889)

=

119905

sum

119889=1

119889 sdot P (119904119905minus119889minus1

= 119878119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

(A5)

For 119889119905+1

(119894) we have

119889119905+1

(119894) =

119905+1

sum

119889=1

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119886)

(A6)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

(A7)

By noticing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

= (P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1))

sdot (P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1))

minus1

(A8)

20 Mathematical Problems in Engineering

we can replace the probability of the second termof (A7)with

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119887)

(A9)

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

(A10)

In the last factor of (A10) we can omit the information aboutthe current state and observation by observing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

asymp P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894| 119904

119905= 119878

119894 x

1 x

119905)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119888)

(A11)

if the following independencies hold

119904119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

119883119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

(A12)

wherewithperpwe denote independency Equation (A12) holdsforHMMs (evenwithout conditioning on x

1 x

119905) but they

do not hold for HSMMs since the state duration (expressedby 119904

119905minus119889+1 119904

119905minus1) determines the system evolution On

the other hand state duration is partially known by theobservtions x

1 x

119905Thus the approximation is reasonable

as long as the uncertainty on the states is within limitsFrom (A6) (A9) and (A11) we obtain

119889119905+1

(119894) = (119886) +

119905+1

sum

119889=2

119889 sdot (119887) sdot (119888)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

P(119860119861|119862)=P(119860|119861119862)sdotP(119861|119862)

+

119905+1

sum

119889=2

119889 sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

it does not depend on 119889

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

+ P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [

[

P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894119904

119905+1= 119878

119894 x

1 x

119905x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

for the approximation of (A11)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

]

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) +

119905

sum

1198891015840=1

(1198891015840+ 1)

sdot P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

(A13)

Noticing that

119905

sum

1198891015840=1

P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

+ P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) = 1

(A14)

because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows

119889119905+1

(119894) = P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1) sdot (119889

119905(119894) + 1)

(A15)

The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step

In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889

119905+1(119894)

we can consider the following equality

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=P (119904

119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

P (119904119905+1

= 119878119894| x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

120574119905+1

(119894)

(A16)

If we consider the terms involved in the probability at thenumerator of the right-hand side of (A16) we have that

x1 x

119905⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119861

perp x119905+1⏟⏟⏟⏟⏟⏟⏟

119862

| 119904119905= 119878

119894 119904

119905+1= 119878

119894⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119860

(A17)

Mathematical Problems in Engineering 21

If 119861 perp 119862 | 119860 for the Bayes rule we have that

P (119860 | 119862 119861) =P (119862 | 119860119861) sdot P (119860 | 119861)

P (119862 | 119861) (A18)

Hence we can rewrite the numerator of the right-hand sideof (A16) as follows

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

= (P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905)

sdot P(x119905+1

|

x119905+1

perp119904119905|119904119905+1

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞119904119905= 119878

119894 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

= (P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

sdot

120574119905(119894)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (119904

119905= 119878

119894| x

1 x

119905) sdot

119887119894(x119905+1

)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (x

119905+1| 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

(A19)

The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as

P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

= sum

119889119905

119886119894119894(d

119905) sdot P (119889

119905| x

1 x

119905)

asymp 119886119894119894(d

119905)

(A20)

while the denominator of (A19) can be expressed as follows

P (x119905+1

| x1 x

119905) =

P (x1 x

119905 x

119905+1)

P (x1 x

119905)

=sum

119873

119894=1120572119905+1

(119894)

sum119873

119894=1120572119905(119894)

(A21)

By substituting (A20) and (A21) in (A19) we obtain

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

sum119873

119894=1120572119905+1

(119894)

(A22)

and then by combining (A22) and (A16) we obtain

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

120574119905+1

(119894) sum119873

119894=1120572119905+1

(119894)

(A23)

Finally by substituting (A23) in (A15) and considering that

120574119905(119894) =

120572119905(119894)

sum119873

119894=1120572119905(119894)

(A24)

we derive the induction formula for 119889119905+1

(119894) in terms of modelparameters as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(x

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A25)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989

[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004

[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996

[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002

[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005

[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008

[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309

[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en

[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003

[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002

[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven

22 Mathematical Problems in Engineering

methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012

[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008

[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989

[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031

[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010

[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010

[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011

[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012

[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012

[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980

[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk

[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007

[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005

[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003

[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997

[26] S-Z Yu ldquoHidden semi-Markov modelsrdquo Artificial Intelligencevol 174 no 2 pp 215ndash243 2010

[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003

[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006

[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014

[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003

[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008

[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005

[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999

[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007

[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011

[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001

[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002

[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007

[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997

[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008

[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005

[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997

Mathematical Problems in Engineering 23

[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002

[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986

[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993

[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006

[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973

[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977

[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012

[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985

[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015

[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013

[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013

[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013

[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013

[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012

[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012

[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model

and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013

[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 11: Hidden Semi-Markov Models for Predictive Maintenance

Mathematical Problems in Engineering 11

0 100 200 300 400 500 600

12345

Hidden states sequence

0 100 200 300 400 500 6000

100

200

300 State duration

0 100 200 300 400 500 6000

20

40

60Observed signal

(a) Example of simulated data for the continuous case

0 100 200 300 400 500 600

12345

Hidden states sequence

0 100 200 300 400 500 6000

100

200

300 State duration

0 100 200 300 400 500 600

132

54

76

Observed symbols

(b) Example of simulated data for the discrete case

Figure 2 The data generated with the parameter described in Section 511 both for the continuous and the discrete case

theWeibull distribution It must be noticed that as explainedin Section 21 for state 119878

5 being the absorbing state the

duration parameters 1205795have no influence on the data since

once the state 1198785is reached the system will remain there

foreverConcerning the continuous observation modeling a

bivariate Gaussian distribution has been used with the fol-lowing parameters [15]

1205831= [

20

20] 120583

2= [

20

35] 120583

3= [

35

35]

1205835= [

28

28]

1198801= [

20 0

0 20] 119880

2= [

15 0

0 15] 119880

3= [

15 minus2

minus2 15]

1198804= [

5 0

0 5] 119880

5= [

10 3

3 10]

(61)

while for the discrete case 119871 = 7 distinct observation sym-bols have been taken into consideration with the followingobservation probability distribution

119861 =

[[[[[

[

08 02 0 0 0 0 0

01 08 01 0 0 0 0

0 01 08 01 0 0 0

0 0 01 07 01 01 0

0 0 0 02 06 01 01

]]]]]

]

(62)

An example of simulated data both for the continuous and thediscrete cases is shown in Figure 2 where aGaussian durationmodel has been used

512 Training and Model Selection The goal of this experi-mental phase is to test the effectiveness of the AIC in solvingthe automatic model selection For this purpose the trainingsets of the 6 data sets (continuousdiscrte observation withGaussian Gamma and Weibull duration models) have beentaken individually and for each one of them a series of

learning procedure has been run each one with a variableHSMM structure In particular we took into account all thecombinations of the duration distribution families (GaussianGamma and Weibull) an increasing number of states 119873from 2 to 8 and for the continuous observation cases anincreasing number of Gaussian mixtures119872 in the observa-tion distribution from 1 to 4

As accurate parameter initialization is crucial for obtain-ing a good model fitting [14] a series of 40 learning proce-dures corresponding to 40 random initializations of the initialparameters 1205820 have been executed for each of the consideredHSMM structures For each model structure the AIC valueas defined in (46) has been evaluatedThe final trained set ofparameters 120582lowast corresponding to the minimumAIC value hasbeen retained resulting in 7 HSMMs with a number of statesfrom 2 to 8

The obtained results are shown in Figure 3 for boththe continuous and discrete observation data As it can benoticed for all the 6 test cases of Figure 3 the AIC valuesdo not improve much for a number of states higher than 5meaning that adding more states does not add considerableinformation to the HSMM modeling power Hence 5 statescan be considered as an optimal number of states Moreoveras shown in the zoomed sections of Figure 3 for the HSMMswith 5 states the minimum AIC values are obtained for theduration distributions corresponding to the ones used togenerate the data As a consequence AIC can be considered asan effective approach to performmodel selection for HSMMas well as selecting the appropriate parametric distributionfamily for the state duration modeling

513 Condition Monitoring The optimal parameters 120582lowast

obtained in the previous phase have been used to performthe online conditionmonitoring experiment on the 10 testingcases for all the 6 considered HSMM configurations In thisexperiment we simulate online monitoring by consideringall the testing observations up to the current time thatis x

1x2sdot sdot sdot x

119905 Each time a new data point is acquired

the Viterbi algorithm is used to estimate the current state119904lowast

119905= argmax

1le119894le119873[120575

119905(119894)] as specified in (34)

12 Mathematical Problems in Engineering

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(a) AIC values for continuous data and Gaussian durationdistribution

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(b) AIC values for continuous data and Gamma duration distri-bution

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(c) AIC values for continuous data and Weibull duration distri-bution

2 3 4 5 6 7 8Number of states

0

02

04

06

08

1

12

14

(d) AIC values for discrete data and Gaussian duration distribu-tion

2 3 4 5 6 7 8Number of states

0

02

04

06

08

1

12

14

GaussianGamma

Weibull

(e) AIC values for discrete data andGammaduration distribution

2 3 4 5 6 7 8Number of states

GaussianGamma

Weibull

0

02

04

06

08

1

12

14

(f) AIC values for discrete data andWeibull duration distribution

Figure 3 Akaike Information Criterion (AIC) applied to continuous and discrete observations data AIC is effective for automatic modelselection since its minimum value provides the same number of states and duration model used to generate the data

An example of execution of the condition monitoringexperiment is shown in Figure 4 for both continuous anddiscrete observations respectively In Figure 4(a) the stateduration has beenmodeledwith aGammadistribution whilein Figure 4(b) with a Gaussian distribution In Figures 4(a)and 4(b) the first display represents the true state of the

HSMM and the second display represents the estimated statefrom the Viterbi algorithm while the third display representsthe observed time series

Knowing the true state sequence we calculated theaccuracy defined as the percentage of correctly estimatedstates over the total length of the state sequence for each of

Mathematical Problems in Engineering 13

0 100 200 300 400 500 600 70012345

True state sequence

0 100 200 300 400 500 600 70012345

Viterbi path accuracy 0985

0 100 200 300 400 500 600 7000

204060 Observations

Correct guessWrong guess

(a) State estimation with Viterbi path for continuous data andGamma duration distribution

0 100 200 300 400 500 600 70012345

True state sequence

0 100 200 300 400 500 600 70012345

Viterbi path accuracy 0992

Correct guessWrong guess

0 100 200 300 400 500 600 700246

Observations

(b) State estimation with Viterbi path for discrete data andGaussian duration distribution

Figure 4 Condition monitoring using the Viterbi path HSMMs can be effective to solve condition monitoring problems in time-dependentapplications due to their high accuracy in hidden state recognition

the testing casesThe results are summarized in Table 1(a) forthe continuous observations and in Table 1(b) for the discreteobservations The high percentage of correct classified statesshows that HSMMs can be effectively used to solve conditionmonitoring problems for applications in which the systemshows a strong time and state duration dependency

514 Remaining Useful Lifetime Estimation In this experi-mental phasewe considered the state 119878

5as the failure state and

the trained parameters 120582lowast of Section 512 for each HSMMconfiguration As the online RUL estimation procedure isintended to be used in real time we simulated conditionmonitoring experiment where we progressively consider theobservations x

1x2sdot sdot sdot x

119905 up to time 119905 When a new obser-

vation is acquired after the current state probability 120575119905(119894)

is estimated (Equation (48)) the calculation of the averageupper and lower RUL ((57) (58) and (59)) is performed

Examples of RUL estimation are illustrated in Figure 5In particular Figure 5(a) represents the case of continuousobservations and durationmodeled by aWeibull distributionwhile Figure 5(b) shows the case of discrete observationsand duration modeled by a Gamma distribution From thefigures one can notice that the average as well as the lowerand the upper bound estimations converges to the real RULas the failure time approaches Moreover as expected theuncertainty about the estimation decreases with the timesince predictions performed at an early stage are moreimprecise As a consequence the upper and the lower boundbecome more narrow as the failure state approaches and theestimation becomes more precise until it converges to theactual RUL value with the prediction error tending to zeroat the end of the evaluation

To quantitatively estimate the performance of our meth-odology for the RUL estimation we considered at each time119905 the absolute prediction error (APE) between the real RULand the predicted value defined as

APE (119905) = 1003816100381610038161003816RULreal (119905) minus RUL (119905)1003816100381610038161003816 (63)

where RULreal(119905) is the (known) value of the RUL at time119905 while RUL(119905) is RUL predicted by the model To evaluatethe overall performance of our methodology we consideredthe average absolute prediction error of the RUL estimationdefined as

APE =sum

119879

119905=1APE (119905)119879

(64)

where 119879 is the length of the testing signal APE being aprediction error average values of (64) close to zero corre-spond to good predictive performances

The results for each of the 10 testing cases and the differentHSMM configurations are reported in Tables 2(a) and 2(b)for the continuous and the discrete observation cases respec-tively As it can be noticed the online Remaining UsefulLifetime estimation and in general the online prediction ofthe time to a certain event can be effectively faced withHSMMs which achieve a reliable estimation power with asmall prediction error

Finally we tested our RUL estimationmethodology usingthe state duration estimation of (16) introduced byAzimi et al[30ndash32] The results are shown in Tables 3(a) and 3(b)in which respectively the prediction errors obtained forcontinuous and discrete observations are reported

Comparing Table 2 and Table 3 one can notice that theproposed RUL method outperforms the one of Azimi This

14 Mathematical Problems in Engineering

0 50 100 150 200 250 300 3500

50

100

150

200

250

300

350

400

Time

True RULUpper RUL

Average RULLower RUL

(a) Remaining Useful Lifetime estimation for continuous data andWeibull duration distribution

0 50 100 150 200 250 300 3500

50

100

150

200

250

300

350

400

Time

True RULUpper RUL

Average RULLower RUL

(b) Remaining Useful Lifetime estimation for discrete data andGamma duration distribution

Figure 5 HSMMs effectively solve RUL estimation problemsThe prediction converges to the actual RUL value and its uncertainty decreasesas the real failure time approaches

Table 1 State recognition accuracy

(a) Continuous observations

Test case Duration distributionGaussian Gamma Weibull

1 994 985 9922 997 986 9953 994 992 9974 989 989 9975 982 989 1006 991 988 9977 985 994 9978 992 991 9959 992 986 99710 992 991 995Average 991 989 996

(b) Discrete observations

Test case Duration distributionGaussian Gamma Weibull

1 974 967 9742 972 976 9653 994 958 9664 982 953 9775 991 974 9756 978 977 9787 958 972 9668 977 964 9729 989 972 98510 992 956 969Average 981 967 973

is mainly due to the proposed average state duration of (20)compared to the one of Azimi given by (16)

52 Real Data In this sectionwe apply the proposedHSMM-based approach for RUL estimation to a real case studyusing bearing monitoring data recorded during experimentscarried out on the Pronostia experimental platform andmadeavailable for the IEEE Prognostics and Health Management(PHM) 2012 challenge [49]The data correspond to normallydegraded bearings leading to cases which closely correspondto the industrial reality

The choice of testing the proposed methodology onbearings derives from two facts (i) bearings are the mostcritical components related to failures of rotating machines[50] and (ii) their monotonically increasing degradationpattern justifies the usage of left-right HSMMmodels

521 Data Description Pronostia is an experimental plat-form designed and realized at the Automatic Controland Micro-Mechatronic Systems (AS2M) Department ofFranche-Comte Electronics Mechanics Thermal Process-ing Optics-Science and Technology (FEMTO-ST) Institute(httpwwwfemto-stfr) (Besancon France) with the aimof collecting real data related to accelerated degradation ofbearings Such data are used to validate methods for bearingcondition assessment diagnostic and prognostic [19 51ndash59]

The Pronostia platform allows to perform run-to-failuretests under constant or variable operating conditions Theoperating conditions are determined by two factors thatcan be controlled online (i) the rotation speed and (ii)the radial force load During each experiment tempera-ture and vibration monitoring measurements are gatheredonline through two type of sensors placed in the bearing

Mathematical Problems in Engineering 15

Table 2 Average absolute prediction error (APE) of the RUL esti-mation using the proposed state duration estimator of (20)

(a) APE of the RUL estimation for the continuous observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 51 170 67 140 290 091 45 170 812 76 190 50 61 210 85 66 190 613 77 54 190 29 120 170 160 290 304 90 210 29 75 220 68 60 190 675 73 190 47 22 140 140 39 170 876 65 180 56 51 180 100 140 270 277 47 160 75 48 170 110 12 130 1208 100 220 29 52 180 100 92 220 399 31 92 140 20 160 130 82 210 4910 64 180 56 75 220 69 33 120 130Average 68 170 74 57 190 99 73 200 70

(b) APE of the RUL estimation for the discrete observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 21 110 140 31 88 140 24 120 1302 21 110 130 110 220 33 190 320 713 51 170 76 66 180 51 23 140 1104 59 65 180 52 170 67 42 160 905 32 140 100 83 190 34 120 240 296 120 240 27 62 180 52 41 84 1607 29 150 97 93 210 23 190 310 668 150 270 70 74 180 43 43 170 949 59 180 77 110 230 55 39 160 8810 35 110 140 55 60 160 52 170 71Average 57 150 100 74 170 66 77 190 90

housing a temperature probe and two accelerometers (oneon the vertical and one on the horizontal axis)

The platform is composed of three main parts a rotatingpart a load profile generation part and a measurement partas illustrated in Figure 6

The rotating part is composed of an asynchronous motorwhich develops a power equal to 250W two shafts and agearbox which allows the motor to reach its rated speed of2830 rpm The motorrsquos rotation speed and the direction areset through a human machine interface

The load profiles part issues a radial force on the externalring of the test bearing through a pneumatic jack connectedto a lever arm which indirectly transmits the load througha clamping ring The goal of the applied radial force is toaccelerate the bearingrsquos degradation process

Table 3 Average absolute prediction error (APE) of the RUL esti-mation using the state duration estimator of (16) introduced byAzimi et al [30ndash32]

(a) APE of the RUL estimation for the continuous observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 578 510 668 262 97 527 259 284 6462 502 444 577 213 170 469 290 192 7083 503 447 573 271 87 565 345 139 7344 518 460 604 213 143 459 349 171 7875 594 537 662 290 95 554 334 156 7496 580 517 671 258 83 541 231 258 6657 594 536 669 182 125 477 360 171 7448 634 556 723 194 157 441 348 178 7709 491 435 570 145 171 432 251 267 67010 544 484 628 232 79 527 241 245 674Average 554 493 635 226 121 499 301 206 715

(b) APE of the RUL estimation for the discrete observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 514 410 624 424 318 530 326 264 7362 496 399 604 595 483 708 313 276 6933 502 386 623 465 357 574 324 257 7024 422 315 538 501 405 606 237 361 6035 443 339 558 478 374 591 360 256 7656 522 432 627 552 443 669 272 316 6437 550 439 668 560 457 670 347 232 7448 503 390 620 604 505 710 351 264 7249 555 474 640 480 372 595 318 222 73610 490 382 607 521 412 631 294 289 687Average 500 397 611 518 413 629 314 274 704

Themeasurement part consists of a data acquisition cardconnected to the monitoring sensors which provides theuser with the measured temperature and vibration data Thevibration measurements are provided in snapshots of 01 scollected each 10 seconds at a sampling frequency of 256 kHz(2560 samples per each snapshot) while the temperature hasbeen continuously recorded at a sampling frequency of 10Hz(600 samples collected each minute)

Further details on the Pronostia test rig can be found onthe data presentation paper [49] and on the web page of thedata challenge (httpwwwfemto-stfrenResearch-depart-mentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-challengephp)

16 Mathematical Problems in Engineering

Table 4 Lifetime duration (in seconds) and operating conditions of the bearings tested in the IEEE PHM 2012 Prognostic Challenge dataset[49]

Condition 1 Condition 2 Condition 31800 rpm and 4000N 1650 rpm and 4200N 1500 rpm and 5000N

Bearing Lifetime [s] Bearing Lifetime [s] Bearing Lifetime [s]Bearing1 1 28030 Bearing2 1 9110 Bearing3 1 5150Bearing1 2 8710 Bearing2 2 7970 Bearing3 2 16370Bearing1 3 23750 Bearing2 3 19550 Bearing3 3 4340Bearing1 4 14280 Bearing2 4 7510Bearing1 5 24630 Bearing2 5 23110Bearing1 6 24480 Bearing2 6 7010Bearing1 7 22590 Bearing2 7 2300

Loadmodule

Testedbearing

Dataacquisitionmodule

Rotatingmodule

Figure 6 Global overview of the Pronostia experimental platform[19]

Regarding the data provided for the PHM 2012 challenge3 different operating conditions were considered

(i) first operating conditions speed of 1800 rpm and loadof 4000 Newton

(ii) second operating conditions speed of 1650 rpm andload of 4200 Newton

(iii) third operating conditions speed of 1500 rpm andload of 5000 Newton

Under the above operating conditions a total of 17 accel-erated life tests were realized on bearings of type NSK 6804DD which can operate at a maximum speed of 13000 rpmand a load limit of 4000N The tests were stopped whenthe amplitude of the vibration signal was higher than 20 gthus this moment was defined as the bearing failure time Anexample of bearing before and after the experiment is shownin Figure 7 together with the corresponding vibration signalcollected during the whole test

Table 4 reports how the 17 tested bearings were separatedinto the three operating conditions Moreover the durationof each experiment being the RUL to be predicted for eachbearing is also given We performed two sets of experimentsby considering respectively the bearings relative to thefirst and the second operating condition (ie Bearing1 1Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

As already mentioned the available data correspondto normally degraded bearings meaning that the defects

were not initially induced and that each degraded bearingcontains almost all the types of defects (balls rings and cage)resembling faithfully a common real industrial situationMoreover no assumption about the type of failure to beoccurred is provided with the data and since the variabilityin experiment durations is high (from 1 h to 7 h) performinggood estimates of the RUL is a difficult task [49]

In our experiments we considered as input to our modelthe horizontal channel of the accelerometerWe preprocessedthe raw signals by extracting two time-domain features thatis root mean square (RMS) and kurtosis within windowsof the same length as the given snapshots (119871 = 2560) Let119903119908(119905) be the raw signal of the 119908th window for each 119908 we

estimate RMS as 119909RMS119908

= radic(1119871)sum119871

119905=11199032119908(119905) and kurtosis as

119909KURT119908

= ((1119871)sum119871

119905=1(119903

119908(119905) minus 119903

119908)4)((1119871)sum

119871

119905=1(119903

119908(119905) minus 119903

119908)2)2

where 119903119908is the mean of 119903

119908 An example of feature extraction

for Bearing1 1 is shown in Figure 8To assess the performance of the proposed HSMM after

the model selection procedure we implemented a leave-one-out cross validation scheme by considering separatelyconditions 1 and 2 we performed the online RUL estimationfor each of the 7 bearings using an HSMM trained with theremaining 6 bearing histories Similarly to the simulated casewe considered the average absolute prediction error definedin (64) to quantitatively evaluate our method

522 Bearings RUL Estimation We performed our exper-iments in two steps firstly we applied model selection inorder to determine an optimalmodel structure and secondlywe estimated the RUL of the bearings The full procedure isdetailed in the following

(A) HSMM Structure To determine an appropriate HSMMstructure for effectively modeling the considered data weconsidered several HSMM structures characterized by (i)the duration distribution family (being Gaussian Gamma orWeibull) (ii) an increasing number of states119873 from 2 to 6and (iii) an increasing number of Gaussian mixtures 119872 inthe observation density from 1 to 4 For eachmodel structureobtained by systematically considering all the combinationsof (i) to (iii) we run 120 parameter learnings correspondingto 120 random initializations 1205820 on the data sets (Bearing1 1

Mathematical Problems in Engineering 17

times106

0 1 2 3 4 5 6 7

50

40

30

20

10

0

minus10

minus20

minus30

minus40

minus50

Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]

(b)(a)

0 280300

7

Extracted feature RMS

Time (s)

0 280300

60

Time (s)

Extracted feature kurtosis

Featureextraction

Win

dow

1

Win

dow

2

Win

dow

3

Win

dow

Win

dow

r(t)

x1 x2 x3

middot middot middot

middot middot middot

nminus1

n

xnminus1 xn

Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1

Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density

(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration

Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast

119894were estimated for the 119894th testing bear-

ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878

4as the failure state The same proce-

dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-

ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as

18 Mathematical Problems in Engineering

2 3 4 5 60

005

01

015

02

025

03

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(a) AIC values for Condition 1

2 3 4 5 60

01

02

03

04

05

06

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(b) AIC values for Condition 2

Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density

0 22590

22590

45000

Time (s)

True RULUpper RUL

Average RULLower RUL

(a) RUL estimation for Bearing1 7

True RULUpper RUL

Average RULLower RUL

0 70100

7010

19000

Time (s)

(b) RUL estimation for Bearing2 6

Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings

well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time

Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors

of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2

6 Conclusion and Future Work

In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation

The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required

Mathematical Problems in Engineering 19

Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds

(a) Condition 1

Test Bearings APEavg APElow APEup

Bearing1 1 105716 127230 94146Bearing1 2 43312 38156 38213Bearing1 3 29970 97309 60912Bearing1 4 63363 28766 148719Bearing1 5 19689 74484 104115Bearing1 6 42530 98964 97937Bearing1 7 13880 74943 100881Average 45494 77122 92132

(b) Condition 2

Test Bearings APEavg APElow APEup

Bearing2 1 24759 50065 72875Bearing2 2 16473 44972 82886Bearing2 3 88771 95083 79621Bearing2 4 17698 42486 49825Bearing2 5 86631 104900 107300Bearing2 6 8771 35047 66870Bearing2 7 30125 38664 66519Average 39033 58745 75128

parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology

This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications

As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension

Appendix

In this appendix we give the derivation of the state durationvariable introduced in (20) as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(119883

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A1)

The random variable 119889119905(119894) has been defined in Section 21

as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889

119905(119894) is sampled from an

arbitrary distribution

119889119905(119894) sim 119891 (119889) (A2)

We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as

P (119889119905(119894) = 119889) = P (119904

119905minus119889minus1= 119878

119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905 120582)

(A3)

We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889

119905(119894) of 119889

119905(119894) defined as its

expected value (see Equation (15))

119889119905(119894) = E (119889

119905(119894) | 119904

119905= 119878

119894 x

1x2sdot sdot sdot x

119905) 1 le 119894 le 119873 (A4)

From the definition of expectation we have

119889119905(119894) =

119905

sum

119889=1

119889 sdot P (119889119905(119894) = 119889)

=

119905

sum

119889=1

119889 sdot P (119904119905minus119889minus1

= 119878119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

(A5)

For 119889119905+1

(119894) we have

119889119905+1

(119894) =

119905+1

sum

119889=1

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119886)

(A6)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

(A7)

By noticing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

= (P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1))

sdot (P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1))

minus1

(A8)

20 Mathematical Problems in Engineering

we can replace the probability of the second termof (A7)with

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119887)

(A9)

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

(A10)

In the last factor of (A10) we can omit the information aboutthe current state and observation by observing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

asymp P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894| 119904

119905= 119878

119894 x

1 x

119905)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119888)

(A11)

if the following independencies hold

119904119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

119883119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

(A12)

wherewithperpwe denote independency Equation (A12) holdsforHMMs (evenwithout conditioning on x

1 x

119905) but they

do not hold for HSMMs since the state duration (expressedby 119904

119905minus119889+1 119904

119905minus1) determines the system evolution On

the other hand state duration is partially known by theobservtions x

1 x

119905Thus the approximation is reasonable

as long as the uncertainty on the states is within limitsFrom (A6) (A9) and (A11) we obtain

119889119905+1

(119894) = (119886) +

119905+1

sum

119889=2

119889 sdot (119887) sdot (119888)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

P(119860119861|119862)=P(119860|119861119862)sdotP(119861|119862)

+

119905+1

sum

119889=2

119889 sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

it does not depend on 119889

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

+ P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [

[

P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894119904

119905+1= 119878

119894 x

1 x

119905x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

for the approximation of (A11)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

]

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) +

119905

sum

1198891015840=1

(1198891015840+ 1)

sdot P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

(A13)

Noticing that

119905

sum

1198891015840=1

P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

+ P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) = 1

(A14)

because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows

119889119905+1

(119894) = P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1) sdot (119889

119905(119894) + 1)

(A15)

The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step

In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889

119905+1(119894)

we can consider the following equality

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=P (119904

119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

P (119904119905+1

= 119878119894| x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

120574119905+1

(119894)

(A16)

If we consider the terms involved in the probability at thenumerator of the right-hand side of (A16) we have that

x1 x

119905⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119861

perp x119905+1⏟⏟⏟⏟⏟⏟⏟

119862

| 119904119905= 119878

119894 119904

119905+1= 119878

119894⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119860

(A17)

Mathematical Problems in Engineering 21

If 119861 perp 119862 | 119860 for the Bayes rule we have that

P (119860 | 119862 119861) =P (119862 | 119860119861) sdot P (119860 | 119861)

P (119862 | 119861) (A18)

Hence we can rewrite the numerator of the right-hand sideof (A16) as follows

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

= (P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905)

sdot P(x119905+1

|

x119905+1

perp119904119905|119904119905+1

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞119904119905= 119878

119894 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

= (P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

sdot

120574119905(119894)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (119904

119905= 119878

119894| x

1 x

119905) sdot

119887119894(x119905+1

)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (x

119905+1| 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

(A19)

The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as

P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

= sum

119889119905

119886119894119894(d

119905) sdot P (119889

119905| x

1 x

119905)

asymp 119886119894119894(d

119905)

(A20)

while the denominator of (A19) can be expressed as follows

P (x119905+1

| x1 x

119905) =

P (x1 x

119905 x

119905+1)

P (x1 x

119905)

=sum

119873

119894=1120572119905+1

(119894)

sum119873

119894=1120572119905(119894)

(A21)

By substituting (A20) and (A21) in (A19) we obtain

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

sum119873

119894=1120572119905+1

(119894)

(A22)

and then by combining (A22) and (A16) we obtain

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

120574119905+1

(119894) sum119873

119894=1120572119905+1

(119894)

(A23)

Finally by substituting (A23) in (A15) and considering that

120574119905(119894) =

120572119905(119894)

sum119873

119894=1120572119905(119894)

(A24)

we derive the induction formula for 119889119905+1

(119894) in terms of modelparameters as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(x

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A25)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989

[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004

[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996

[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002

[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005

[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008

[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309

[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en

[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003

[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002

[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven

22 Mathematical Problems in Engineering

methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012

[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008

[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989

[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031

[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010

[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010

[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011

[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012

[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012

[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980

[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk

[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007

[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005

[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003

[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997

[26] S-Z Yu ldquoHidden semi-Markov modelsrdquo Artificial Intelligencevol 174 no 2 pp 215ndash243 2010

[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003

[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006

[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014

[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003

[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008

[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005

[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999

[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007

[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011

[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001

[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002

[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007

[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997

[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008

[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005

[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997

Mathematical Problems in Engineering 23

[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002

[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986

[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993

[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006

[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973

[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977

[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012

[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985

[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015

[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013

[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013

[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013

[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013

[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012

[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012

[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model

and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013

[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 12: Hidden Semi-Markov Models for Predictive Maintenance

12 Mathematical Problems in Engineering

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(a) AIC values for continuous data and Gaussian durationdistribution

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(b) AIC values for continuous data and Gamma duration distri-bution

2 3 4 5 6 7 80

1

2

3

4

5

6

7

Number of states

(c) AIC values for continuous data and Weibull duration distri-bution

2 3 4 5 6 7 8Number of states

0

02

04

06

08

1

12

14

(d) AIC values for discrete data and Gaussian duration distribu-tion

2 3 4 5 6 7 8Number of states

0

02

04

06

08

1

12

14

GaussianGamma

Weibull

(e) AIC values for discrete data andGammaduration distribution

2 3 4 5 6 7 8Number of states

GaussianGamma

Weibull

0

02

04

06

08

1

12

14

(f) AIC values for discrete data andWeibull duration distribution

Figure 3 Akaike Information Criterion (AIC) applied to continuous and discrete observations data AIC is effective for automatic modelselection since its minimum value provides the same number of states and duration model used to generate the data

An example of execution of the condition monitoringexperiment is shown in Figure 4 for both continuous anddiscrete observations respectively In Figure 4(a) the stateduration has beenmodeledwith aGammadistribution whilein Figure 4(b) with a Gaussian distribution In Figures 4(a)and 4(b) the first display represents the true state of the

HSMM and the second display represents the estimated statefrom the Viterbi algorithm while the third display representsthe observed time series

Knowing the true state sequence we calculated theaccuracy defined as the percentage of correctly estimatedstates over the total length of the state sequence for each of

Mathematical Problems in Engineering 13

0 100 200 300 400 500 600 70012345

True state sequence

0 100 200 300 400 500 600 70012345

Viterbi path accuracy 0985

0 100 200 300 400 500 600 7000

204060 Observations

Correct guessWrong guess

(a) State estimation with Viterbi path for continuous data andGamma duration distribution

0 100 200 300 400 500 600 70012345

True state sequence

0 100 200 300 400 500 600 70012345

Viterbi path accuracy 0992

Correct guessWrong guess

0 100 200 300 400 500 600 700246

Observations

(b) State estimation with Viterbi path for discrete data andGaussian duration distribution

Figure 4 Condition monitoring using the Viterbi path HSMMs can be effective to solve condition monitoring problems in time-dependentapplications due to their high accuracy in hidden state recognition

the testing casesThe results are summarized in Table 1(a) forthe continuous observations and in Table 1(b) for the discreteobservations The high percentage of correct classified statesshows that HSMMs can be effectively used to solve conditionmonitoring problems for applications in which the systemshows a strong time and state duration dependency

514 Remaining Useful Lifetime Estimation In this experi-mental phasewe considered the state 119878

5as the failure state and

the trained parameters 120582lowast of Section 512 for each HSMMconfiguration As the online RUL estimation procedure isintended to be used in real time we simulated conditionmonitoring experiment where we progressively consider theobservations x

1x2sdot sdot sdot x

119905 up to time 119905 When a new obser-

vation is acquired after the current state probability 120575119905(119894)

is estimated (Equation (48)) the calculation of the averageupper and lower RUL ((57) (58) and (59)) is performed

Examples of RUL estimation are illustrated in Figure 5In particular Figure 5(a) represents the case of continuousobservations and durationmodeled by aWeibull distributionwhile Figure 5(b) shows the case of discrete observationsand duration modeled by a Gamma distribution From thefigures one can notice that the average as well as the lowerand the upper bound estimations converges to the real RULas the failure time approaches Moreover as expected theuncertainty about the estimation decreases with the timesince predictions performed at an early stage are moreimprecise As a consequence the upper and the lower boundbecome more narrow as the failure state approaches and theestimation becomes more precise until it converges to theactual RUL value with the prediction error tending to zeroat the end of the evaluation

To quantitatively estimate the performance of our meth-odology for the RUL estimation we considered at each time119905 the absolute prediction error (APE) between the real RULand the predicted value defined as

APE (119905) = 1003816100381610038161003816RULreal (119905) minus RUL (119905)1003816100381610038161003816 (63)

where RULreal(119905) is the (known) value of the RUL at time119905 while RUL(119905) is RUL predicted by the model To evaluatethe overall performance of our methodology we consideredthe average absolute prediction error of the RUL estimationdefined as

APE =sum

119879

119905=1APE (119905)119879

(64)

where 119879 is the length of the testing signal APE being aprediction error average values of (64) close to zero corre-spond to good predictive performances

The results for each of the 10 testing cases and the differentHSMM configurations are reported in Tables 2(a) and 2(b)for the continuous and the discrete observation cases respec-tively As it can be noticed the online Remaining UsefulLifetime estimation and in general the online prediction ofthe time to a certain event can be effectively faced withHSMMs which achieve a reliable estimation power with asmall prediction error

Finally we tested our RUL estimationmethodology usingthe state duration estimation of (16) introduced byAzimi et al[30ndash32] The results are shown in Tables 3(a) and 3(b)in which respectively the prediction errors obtained forcontinuous and discrete observations are reported

Comparing Table 2 and Table 3 one can notice that theproposed RUL method outperforms the one of Azimi This

14 Mathematical Problems in Engineering

0 50 100 150 200 250 300 3500

50

100

150

200

250

300

350

400

Time

True RULUpper RUL

Average RULLower RUL

(a) Remaining Useful Lifetime estimation for continuous data andWeibull duration distribution

0 50 100 150 200 250 300 3500

50

100

150

200

250

300

350

400

Time

True RULUpper RUL

Average RULLower RUL

(b) Remaining Useful Lifetime estimation for discrete data andGamma duration distribution

Figure 5 HSMMs effectively solve RUL estimation problemsThe prediction converges to the actual RUL value and its uncertainty decreasesas the real failure time approaches

Table 1 State recognition accuracy

(a) Continuous observations

Test case Duration distributionGaussian Gamma Weibull

1 994 985 9922 997 986 9953 994 992 9974 989 989 9975 982 989 1006 991 988 9977 985 994 9978 992 991 9959 992 986 99710 992 991 995Average 991 989 996

(b) Discrete observations

Test case Duration distributionGaussian Gamma Weibull

1 974 967 9742 972 976 9653 994 958 9664 982 953 9775 991 974 9756 978 977 9787 958 972 9668 977 964 9729 989 972 98510 992 956 969Average 981 967 973

is mainly due to the proposed average state duration of (20)compared to the one of Azimi given by (16)

52 Real Data In this sectionwe apply the proposedHSMM-based approach for RUL estimation to a real case studyusing bearing monitoring data recorded during experimentscarried out on the Pronostia experimental platform andmadeavailable for the IEEE Prognostics and Health Management(PHM) 2012 challenge [49]The data correspond to normallydegraded bearings leading to cases which closely correspondto the industrial reality

The choice of testing the proposed methodology onbearings derives from two facts (i) bearings are the mostcritical components related to failures of rotating machines[50] and (ii) their monotonically increasing degradationpattern justifies the usage of left-right HSMMmodels

521 Data Description Pronostia is an experimental plat-form designed and realized at the Automatic Controland Micro-Mechatronic Systems (AS2M) Department ofFranche-Comte Electronics Mechanics Thermal Process-ing Optics-Science and Technology (FEMTO-ST) Institute(httpwwwfemto-stfr) (Besancon France) with the aimof collecting real data related to accelerated degradation ofbearings Such data are used to validate methods for bearingcondition assessment diagnostic and prognostic [19 51ndash59]

The Pronostia platform allows to perform run-to-failuretests under constant or variable operating conditions Theoperating conditions are determined by two factors thatcan be controlled online (i) the rotation speed and (ii)the radial force load During each experiment tempera-ture and vibration monitoring measurements are gatheredonline through two type of sensors placed in the bearing

Mathematical Problems in Engineering 15

Table 2 Average absolute prediction error (APE) of the RUL esti-mation using the proposed state duration estimator of (20)

(a) APE of the RUL estimation for the continuous observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 51 170 67 140 290 091 45 170 812 76 190 50 61 210 85 66 190 613 77 54 190 29 120 170 160 290 304 90 210 29 75 220 68 60 190 675 73 190 47 22 140 140 39 170 876 65 180 56 51 180 100 140 270 277 47 160 75 48 170 110 12 130 1208 100 220 29 52 180 100 92 220 399 31 92 140 20 160 130 82 210 4910 64 180 56 75 220 69 33 120 130Average 68 170 74 57 190 99 73 200 70

(b) APE of the RUL estimation for the discrete observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 21 110 140 31 88 140 24 120 1302 21 110 130 110 220 33 190 320 713 51 170 76 66 180 51 23 140 1104 59 65 180 52 170 67 42 160 905 32 140 100 83 190 34 120 240 296 120 240 27 62 180 52 41 84 1607 29 150 97 93 210 23 190 310 668 150 270 70 74 180 43 43 170 949 59 180 77 110 230 55 39 160 8810 35 110 140 55 60 160 52 170 71Average 57 150 100 74 170 66 77 190 90

housing a temperature probe and two accelerometers (oneon the vertical and one on the horizontal axis)

The platform is composed of three main parts a rotatingpart a load profile generation part and a measurement partas illustrated in Figure 6

The rotating part is composed of an asynchronous motorwhich develops a power equal to 250W two shafts and agearbox which allows the motor to reach its rated speed of2830 rpm The motorrsquos rotation speed and the direction areset through a human machine interface

The load profiles part issues a radial force on the externalring of the test bearing through a pneumatic jack connectedto a lever arm which indirectly transmits the load througha clamping ring The goal of the applied radial force is toaccelerate the bearingrsquos degradation process

Table 3 Average absolute prediction error (APE) of the RUL esti-mation using the state duration estimator of (16) introduced byAzimi et al [30ndash32]

(a) APE of the RUL estimation for the continuous observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 578 510 668 262 97 527 259 284 6462 502 444 577 213 170 469 290 192 7083 503 447 573 271 87 565 345 139 7344 518 460 604 213 143 459 349 171 7875 594 537 662 290 95 554 334 156 7496 580 517 671 258 83 541 231 258 6657 594 536 669 182 125 477 360 171 7448 634 556 723 194 157 441 348 178 7709 491 435 570 145 171 432 251 267 67010 544 484 628 232 79 527 241 245 674Average 554 493 635 226 121 499 301 206 715

(b) APE of the RUL estimation for the discrete observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 514 410 624 424 318 530 326 264 7362 496 399 604 595 483 708 313 276 6933 502 386 623 465 357 574 324 257 7024 422 315 538 501 405 606 237 361 6035 443 339 558 478 374 591 360 256 7656 522 432 627 552 443 669 272 316 6437 550 439 668 560 457 670 347 232 7448 503 390 620 604 505 710 351 264 7249 555 474 640 480 372 595 318 222 73610 490 382 607 521 412 631 294 289 687Average 500 397 611 518 413 629 314 274 704

Themeasurement part consists of a data acquisition cardconnected to the monitoring sensors which provides theuser with the measured temperature and vibration data Thevibration measurements are provided in snapshots of 01 scollected each 10 seconds at a sampling frequency of 256 kHz(2560 samples per each snapshot) while the temperature hasbeen continuously recorded at a sampling frequency of 10Hz(600 samples collected each minute)

Further details on the Pronostia test rig can be found onthe data presentation paper [49] and on the web page of thedata challenge (httpwwwfemto-stfrenResearch-depart-mentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-challengephp)

16 Mathematical Problems in Engineering

Table 4 Lifetime duration (in seconds) and operating conditions of the bearings tested in the IEEE PHM 2012 Prognostic Challenge dataset[49]

Condition 1 Condition 2 Condition 31800 rpm and 4000N 1650 rpm and 4200N 1500 rpm and 5000N

Bearing Lifetime [s] Bearing Lifetime [s] Bearing Lifetime [s]Bearing1 1 28030 Bearing2 1 9110 Bearing3 1 5150Bearing1 2 8710 Bearing2 2 7970 Bearing3 2 16370Bearing1 3 23750 Bearing2 3 19550 Bearing3 3 4340Bearing1 4 14280 Bearing2 4 7510Bearing1 5 24630 Bearing2 5 23110Bearing1 6 24480 Bearing2 6 7010Bearing1 7 22590 Bearing2 7 2300

Loadmodule

Testedbearing

Dataacquisitionmodule

Rotatingmodule

Figure 6 Global overview of the Pronostia experimental platform[19]

Regarding the data provided for the PHM 2012 challenge3 different operating conditions were considered

(i) first operating conditions speed of 1800 rpm and loadof 4000 Newton

(ii) second operating conditions speed of 1650 rpm andload of 4200 Newton

(iii) third operating conditions speed of 1500 rpm andload of 5000 Newton

Under the above operating conditions a total of 17 accel-erated life tests were realized on bearings of type NSK 6804DD which can operate at a maximum speed of 13000 rpmand a load limit of 4000N The tests were stopped whenthe amplitude of the vibration signal was higher than 20 gthus this moment was defined as the bearing failure time Anexample of bearing before and after the experiment is shownin Figure 7 together with the corresponding vibration signalcollected during the whole test

Table 4 reports how the 17 tested bearings were separatedinto the three operating conditions Moreover the durationof each experiment being the RUL to be predicted for eachbearing is also given We performed two sets of experimentsby considering respectively the bearings relative to thefirst and the second operating condition (ie Bearing1 1Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

As already mentioned the available data correspondto normally degraded bearings meaning that the defects

were not initially induced and that each degraded bearingcontains almost all the types of defects (balls rings and cage)resembling faithfully a common real industrial situationMoreover no assumption about the type of failure to beoccurred is provided with the data and since the variabilityin experiment durations is high (from 1 h to 7 h) performinggood estimates of the RUL is a difficult task [49]

In our experiments we considered as input to our modelthe horizontal channel of the accelerometerWe preprocessedthe raw signals by extracting two time-domain features thatis root mean square (RMS) and kurtosis within windowsof the same length as the given snapshots (119871 = 2560) Let119903119908(119905) be the raw signal of the 119908th window for each 119908 we

estimate RMS as 119909RMS119908

= radic(1119871)sum119871

119905=11199032119908(119905) and kurtosis as

119909KURT119908

= ((1119871)sum119871

119905=1(119903

119908(119905) minus 119903

119908)4)((1119871)sum

119871

119905=1(119903

119908(119905) minus 119903

119908)2)2

where 119903119908is the mean of 119903

119908 An example of feature extraction

for Bearing1 1 is shown in Figure 8To assess the performance of the proposed HSMM after

the model selection procedure we implemented a leave-one-out cross validation scheme by considering separatelyconditions 1 and 2 we performed the online RUL estimationfor each of the 7 bearings using an HSMM trained with theremaining 6 bearing histories Similarly to the simulated casewe considered the average absolute prediction error definedin (64) to quantitatively evaluate our method

522 Bearings RUL Estimation We performed our exper-iments in two steps firstly we applied model selection inorder to determine an optimalmodel structure and secondlywe estimated the RUL of the bearings The full procedure isdetailed in the following

(A) HSMM Structure To determine an appropriate HSMMstructure for effectively modeling the considered data weconsidered several HSMM structures characterized by (i)the duration distribution family (being Gaussian Gamma orWeibull) (ii) an increasing number of states119873 from 2 to 6and (iii) an increasing number of Gaussian mixtures 119872 inthe observation density from 1 to 4 For eachmodel structureobtained by systematically considering all the combinationsof (i) to (iii) we run 120 parameter learnings correspondingto 120 random initializations 1205820 on the data sets (Bearing1 1

Mathematical Problems in Engineering 17

times106

0 1 2 3 4 5 6 7

50

40

30

20

10

0

minus10

minus20

minus30

minus40

minus50

Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]

(b)(a)

0 280300

7

Extracted feature RMS

Time (s)

0 280300

60

Time (s)

Extracted feature kurtosis

Featureextraction

Win

dow

1

Win

dow

2

Win

dow

3

Win

dow

Win

dow

r(t)

x1 x2 x3

middot middot middot

middot middot middot

nminus1

n

xnminus1 xn

Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1

Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density

(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration

Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast

119894were estimated for the 119894th testing bear-

ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878

4as the failure state The same proce-

dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-

ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as

18 Mathematical Problems in Engineering

2 3 4 5 60

005

01

015

02

025

03

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(a) AIC values for Condition 1

2 3 4 5 60

01

02

03

04

05

06

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(b) AIC values for Condition 2

Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density

0 22590

22590

45000

Time (s)

True RULUpper RUL

Average RULLower RUL

(a) RUL estimation for Bearing1 7

True RULUpper RUL

Average RULLower RUL

0 70100

7010

19000

Time (s)

(b) RUL estimation for Bearing2 6

Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings

well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time

Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors

of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2

6 Conclusion and Future Work

In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation

The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required

Mathematical Problems in Engineering 19

Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds

(a) Condition 1

Test Bearings APEavg APElow APEup

Bearing1 1 105716 127230 94146Bearing1 2 43312 38156 38213Bearing1 3 29970 97309 60912Bearing1 4 63363 28766 148719Bearing1 5 19689 74484 104115Bearing1 6 42530 98964 97937Bearing1 7 13880 74943 100881Average 45494 77122 92132

(b) Condition 2

Test Bearings APEavg APElow APEup

Bearing2 1 24759 50065 72875Bearing2 2 16473 44972 82886Bearing2 3 88771 95083 79621Bearing2 4 17698 42486 49825Bearing2 5 86631 104900 107300Bearing2 6 8771 35047 66870Bearing2 7 30125 38664 66519Average 39033 58745 75128

parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology

This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications

As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension

Appendix

In this appendix we give the derivation of the state durationvariable introduced in (20) as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(119883

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A1)

The random variable 119889119905(119894) has been defined in Section 21

as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889

119905(119894) is sampled from an

arbitrary distribution

119889119905(119894) sim 119891 (119889) (A2)

We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as

P (119889119905(119894) = 119889) = P (119904

119905minus119889minus1= 119878

119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905 120582)

(A3)

We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889

119905(119894) of 119889

119905(119894) defined as its

expected value (see Equation (15))

119889119905(119894) = E (119889

119905(119894) | 119904

119905= 119878

119894 x

1x2sdot sdot sdot x

119905) 1 le 119894 le 119873 (A4)

From the definition of expectation we have

119889119905(119894) =

119905

sum

119889=1

119889 sdot P (119889119905(119894) = 119889)

=

119905

sum

119889=1

119889 sdot P (119904119905minus119889minus1

= 119878119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

(A5)

For 119889119905+1

(119894) we have

119889119905+1

(119894) =

119905+1

sum

119889=1

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119886)

(A6)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

(A7)

By noticing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

= (P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1))

sdot (P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1))

minus1

(A8)

20 Mathematical Problems in Engineering

we can replace the probability of the second termof (A7)with

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119887)

(A9)

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

(A10)

In the last factor of (A10) we can omit the information aboutthe current state and observation by observing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

asymp P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894| 119904

119905= 119878

119894 x

1 x

119905)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119888)

(A11)

if the following independencies hold

119904119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

119883119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

(A12)

wherewithperpwe denote independency Equation (A12) holdsforHMMs (evenwithout conditioning on x

1 x

119905) but they

do not hold for HSMMs since the state duration (expressedby 119904

119905minus119889+1 119904

119905minus1) determines the system evolution On

the other hand state duration is partially known by theobservtions x

1 x

119905Thus the approximation is reasonable

as long as the uncertainty on the states is within limitsFrom (A6) (A9) and (A11) we obtain

119889119905+1

(119894) = (119886) +

119905+1

sum

119889=2

119889 sdot (119887) sdot (119888)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

P(119860119861|119862)=P(119860|119861119862)sdotP(119861|119862)

+

119905+1

sum

119889=2

119889 sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

it does not depend on 119889

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

+ P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [

[

P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894119904

119905+1= 119878

119894 x

1 x

119905x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

for the approximation of (A11)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

]

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) +

119905

sum

1198891015840=1

(1198891015840+ 1)

sdot P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

(A13)

Noticing that

119905

sum

1198891015840=1

P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

+ P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) = 1

(A14)

because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows

119889119905+1

(119894) = P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1) sdot (119889

119905(119894) + 1)

(A15)

The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step

In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889

119905+1(119894)

we can consider the following equality

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=P (119904

119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

P (119904119905+1

= 119878119894| x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

120574119905+1

(119894)

(A16)

If we consider the terms involved in the probability at thenumerator of the right-hand side of (A16) we have that

x1 x

119905⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119861

perp x119905+1⏟⏟⏟⏟⏟⏟⏟

119862

| 119904119905= 119878

119894 119904

119905+1= 119878

119894⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119860

(A17)

Mathematical Problems in Engineering 21

If 119861 perp 119862 | 119860 for the Bayes rule we have that

P (119860 | 119862 119861) =P (119862 | 119860119861) sdot P (119860 | 119861)

P (119862 | 119861) (A18)

Hence we can rewrite the numerator of the right-hand sideof (A16) as follows

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

= (P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905)

sdot P(x119905+1

|

x119905+1

perp119904119905|119904119905+1

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞119904119905= 119878

119894 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

= (P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

sdot

120574119905(119894)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (119904

119905= 119878

119894| x

1 x

119905) sdot

119887119894(x119905+1

)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (x

119905+1| 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

(A19)

The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as

P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

= sum

119889119905

119886119894119894(d

119905) sdot P (119889

119905| x

1 x

119905)

asymp 119886119894119894(d

119905)

(A20)

while the denominator of (A19) can be expressed as follows

P (x119905+1

| x1 x

119905) =

P (x1 x

119905 x

119905+1)

P (x1 x

119905)

=sum

119873

119894=1120572119905+1

(119894)

sum119873

119894=1120572119905(119894)

(A21)

By substituting (A20) and (A21) in (A19) we obtain

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

sum119873

119894=1120572119905+1

(119894)

(A22)

and then by combining (A22) and (A16) we obtain

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

120574119905+1

(119894) sum119873

119894=1120572119905+1

(119894)

(A23)

Finally by substituting (A23) in (A15) and considering that

120574119905(119894) =

120572119905(119894)

sum119873

119894=1120572119905(119894)

(A24)

we derive the induction formula for 119889119905+1

(119894) in terms of modelparameters as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(x

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A25)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989

[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004

[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996

[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002

[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005

[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008

[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309

[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en

[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003

[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002

[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven

22 Mathematical Problems in Engineering

methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012

[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008

[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989

[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031

[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010

[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010

[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011

[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012

[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012

[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980

[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk

[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007

[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005

[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003

[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997

[26] S-Z Yu ldquoHidden semi-Markov modelsrdquo Artificial Intelligencevol 174 no 2 pp 215ndash243 2010

[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003

[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006

[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014

[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003

[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008

[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005

[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999

[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007

[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011

[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001

[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002

[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007

[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997

[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008

[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005

[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997

Mathematical Problems in Engineering 23

[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002

[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986

[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993

[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006

[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973

[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977

[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012

[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985

[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015

[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013

[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013

[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013

[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013

[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012

[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012

[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model

and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013

[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 13: Hidden Semi-Markov Models for Predictive Maintenance

Mathematical Problems in Engineering 13

0 100 200 300 400 500 600 70012345

True state sequence

0 100 200 300 400 500 600 70012345

Viterbi path accuracy 0985

0 100 200 300 400 500 600 7000

204060 Observations

Correct guessWrong guess

(a) State estimation with Viterbi path for continuous data andGamma duration distribution

0 100 200 300 400 500 600 70012345

True state sequence

0 100 200 300 400 500 600 70012345

Viterbi path accuracy 0992

Correct guessWrong guess

0 100 200 300 400 500 600 700246

Observations

(b) State estimation with Viterbi path for discrete data andGaussian duration distribution

Figure 4 Condition monitoring using the Viterbi path HSMMs can be effective to solve condition monitoring problems in time-dependentapplications due to their high accuracy in hidden state recognition

the testing casesThe results are summarized in Table 1(a) forthe continuous observations and in Table 1(b) for the discreteobservations The high percentage of correct classified statesshows that HSMMs can be effectively used to solve conditionmonitoring problems for applications in which the systemshows a strong time and state duration dependency

514 Remaining Useful Lifetime Estimation In this experi-mental phasewe considered the state 119878

5as the failure state and

the trained parameters 120582lowast of Section 512 for each HSMMconfiguration As the online RUL estimation procedure isintended to be used in real time we simulated conditionmonitoring experiment where we progressively consider theobservations x

1x2sdot sdot sdot x

119905 up to time 119905 When a new obser-

vation is acquired after the current state probability 120575119905(119894)

is estimated (Equation (48)) the calculation of the averageupper and lower RUL ((57) (58) and (59)) is performed

Examples of RUL estimation are illustrated in Figure 5In particular Figure 5(a) represents the case of continuousobservations and durationmodeled by aWeibull distributionwhile Figure 5(b) shows the case of discrete observationsand duration modeled by a Gamma distribution From thefigures one can notice that the average as well as the lowerand the upper bound estimations converges to the real RULas the failure time approaches Moreover as expected theuncertainty about the estimation decreases with the timesince predictions performed at an early stage are moreimprecise As a consequence the upper and the lower boundbecome more narrow as the failure state approaches and theestimation becomes more precise until it converges to theactual RUL value with the prediction error tending to zeroat the end of the evaluation

To quantitatively estimate the performance of our meth-odology for the RUL estimation we considered at each time119905 the absolute prediction error (APE) between the real RULand the predicted value defined as

APE (119905) = 1003816100381610038161003816RULreal (119905) minus RUL (119905)1003816100381610038161003816 (63)

where RULreal(119905) is the (known) value of the RUL at time119905 while RUL(119905) is RUL predicted by the model To evaluatethe overall performance of our methodology we consideredthe average absolute prediction error of the RUL estimationdefined as

APE =sum

119879

119905=1APE (119905)119879

(64)

where 119879 is the length of the testing signal APE being aprediction error average values of (64) close to zero corre-spond to good predictive performances

The results for each of the 10 testing cases and the differentHSMM configurations are reported in Tables 2(a) and 2(b)for the continuous and the discrete observation cases respec-tively As it can be noticed the online Remaining UsefulLifetime estimation and in general the online prediction ofthe time to a certain event can be effectively faced withHSMMs which achieve a reliable estimation power with asmall prediction error

Finally we tested our RUL estimationmethodology usingthe state duration estimation of (16) introduced byAzimi et al[30ndash32] The results are shown in Tables 3(a) and 3(b)in which respectively the prediction errors obtained forcontinuous and discrete observations are reported

Comparing Table 2 and Table 3 one can notice that theproposed RUL method outperforms the one of Azimi This

14 Mathematical Problems in Engineering

0 50 100 150 200 250 300 3500

50

100

150

200

250

300

350

400

Time

True RULUpper RUL

Average RULLower RUL

(a) Remaining Useful Lifetime estimation for continuous data andWeibull duration distribution

0 50 100 150 200 250 300 3500

50

100

150

200

250

300

350

400

Time

True RULUpper RUL

Average RULLower RUL

(b) Remaining Useful Lifetime estimation for discrete data andGamma duration distribution

Figure 5 HSMMs effectively solve RUL estimation problemsThe prediction converges to the actual RUL value and its uncertainty decreasesas the real failure time approaches

Table 1 State recognition accuracy

(a) Continuous observations

Test case Duration distributionGaussian Gamma Weibull

1 994 985 9922 997 986 9953 994 992 9974 989 989 9975 982 989 1006 991 988 9977 985 994 9978 992 991 9959 992 986 99710 992 991 995Average 991 989 996

(b) Discrete observations

Test case Duration distributionGaussian Gamma Weibull

1 974 967 9742 972 976 9653 994 958 9664 982 953 9775 991 974 9756 978 977 9787 958 972 9668 977 964 9729 989 972 98510 992 956 969Average 981 967 973

is mainly due to the proposed average state duration of (20)compared to the one of Azimi given by (16)

52 Real Data In this sectionwe apply the proposedHSMM-based approach for RUL estimation to a real case studyusing bearing monitoring data recorded during experimentscarried out on the Pronostia experimental platform andmadeavailable for the IEEE Prognostics and Health Management(PHM) 2012 challenge [49]The data correspond to normallydegraded bearings leading to cases which closely correspondto the industrial reality

The choice of testing the proposed methodology onbearings derives from two facts (i) bearings are the mostcritical components related to failures of rotating machines[50] and (ii) their monotonically increasing degradationpattern justifies the usage of left-right HSMMmodels

521 Data Description Pronostia is an experimental plat-form designed and realized at the Automatic Controland Micro-Mechatronic Systems (AS2M) Department ofFranche-Comte Electronics Mechanics Thermal Process-ing Optics-Science and Technology (FEMTO-ST) Institute(httpwwwfemto-stfr) (Besancon France) with the aimof collecting real data related to accelerated degradation ofbearings Such data are used to validate methods for bearingcondition assessment diagnostic and prognostic [19 51ndash59]

The Pronostia platform allows to perform run-to-failuretests under constant or variable operating conditions Theoperating conditions are determined by two factors thatcan be controlled online (i) the rotation speed and (ii)the radial force load During each experiment tempera-ture and vibration monitoring measurements are gatheredonline through two type of sensors placed in the bearing

Mathematical Problems in Engineering 15

Table 2 Average absolute prediction error (APE) of the RUL esti-mation using the proposed state duration estimator of (20)

(a) APE of the RUL estimation for the continuous observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 51 170 67 140 290 091 45 170 812 76 190 50 61 210 85 66 190 613 77 54 190 29 120 170 160 290 304 90 210 29 75 220 68 60 190 675 73 190 47 22 140 140 39 170 876 65 180 56 51 180 100 140 270 277 47 160 75 48 170 110 12 130 1208 100 220 29 52 180 100 92 220 399 31 92 140 20 160 130 82 210 4910 64 180 56 75 220 69 33 120 130Average 68 170 74 57 190 99 73 200 70

(b) APE of the RUL estimation for the discrete observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 21 110 140 31 88 140 24 120 1302 21 110 130 110 220 33 190 320 713 51 170 76 66 180 51 23 140 1104 59 65 180 52 170 67 42 160 905 32 140 100 83 190 34 120 240 296 120 240 27 62 180 52 41 84 1607 29 150 97 93 210 23 190 310 668 150 270 70 74 180 43 43 170 949 59 180 77 110 230 55 39 160 8810 35 110 140 55 60 160 52 170 71Average 57 150 100 74 170 66 77 190 90

housing a temperature probe and two accelerometers (oneon the vertical and one on the horizontal axis)

The platform is composed of three main parts a rotatingpart a load profile generation part and a measurement partas illustrated in Figure 6

The rotating part is composed of an asynchronous motorwhich develops a power equal to 250W two shafts and agearbox which allows the motor to reach its rated speed of2830 rpm The motorrsquos rotation speed and the direction areset through a human machine interface

The load profiles part issues a radial force on the externalring of the test bearing through a pneumatic jack connectedto a lever arm which indirectly transmits the load througha clamping ring The goal of the applied radial force is toaccelerate the bearingrsquos degradation process

Table 3 Average absolute prediction error (APE) of the RUL esti-mation using the state duration estimator of (16) introduced byAzimi et al [30ndash32]

(a) APE of the RUL estimation for the continuous observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 578 510 668 262 97 527 259 284 6462 502 444 577 213 170 469 290 192 7083 503 447 573 271 87 565 345 139 7344 518 460 604 213 143 459 349 171 7875 594 537 662 290 95 554 334 156 7496 580 517 671 258 83 541 231 258 6657 594 536 669 182 125 477 360 171 7448 634 556 723 194 157 441 348 178 7709 491 435 570 145 171 432 251 267 67010 544 484 628 232 79 527 241 245 674Average 554 493 635 226 121 499 301 206 715

(b) APE of the RUL estimation for the discrete observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 514 410 624 424 318 530 326 264 7362 496 399 604 595 483 708 313 276 6933 502 386 623 465 357 574 324 257 7024 422 315 538 501 405 606 237 361 6035 443 339 558 478 374 591 360 256 7656 522 432 627 552 443 669 272 316 6437 550 439 668 560 457 670 347 232 7448 503 390 620 604 505 710 351 264 7249 555 474 640 480 372 595 318 222 73610 490 382 607 521 412 631 294 289 687Average 500 397 611 518 413 629 314 274 704

Themeasurement part consists of a data acquisition cardconnected to the monitoring sensors which provides theuser with the measured temperature and vibration data Thevibration measurements are provided in snapshots of 01 scollected each 10 seconds at a sampling frequency of 256 kHz(2560 samples per each snapshot) while the temperature hasbeen continuously recorded at a sampling frequency of 10Hz(600 samples collected each minute)

Further details on the Pronostia test rig can be found onthe data presentation paper [49] and on the web page of thedata challenge (httpwwwfemto-stfrenResearch-depart-mentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-challengephp)

16 Mathematical Problems in Engineering

Table 4 Lifetime duration (in seconds) and operating conditions of the bearings tested in the IEEE PHM 2012 Prognostic Challenge dataset[49]

Condition 1 Condition 2 Condition 31800 rpm and 4000N 1650 rpm and 4200N 1500 rpm and 5000N

Bearing Lifetime [s] Bearing Lifetime [s] Bearing Lifetime [s]Bearing1 1 28030 Bearing2 1 9110 Bearing3 1 5150Bearing1 2 8710 Bearing2 2 7970 Bearing3 2 16370Bearing1 3 23750 Bearing2 3 19550 Bearing3 3 4340Bearing1 4 14280 Bearing2 4 7510Bearing1 5 24630 Bearing2 5 23110Bearing1 6 24480 Bearing2 6 7010Bearing1 7 22590 Bearing2 7 2300

Loadmodule

Testedbearing

Dataacquisitionmodule

Rotatingmodule

Figure 6 Global overview of the Pronostia experimental platform[19]

Regarding the data provided for the PHM 2012 challenge3 different operating conditions were considered

(i) first operating conditions speed of 1800 rpm and loadof 4000 Newton

(ii) second operating conditions speed of 1650 rpm andload of 4200 Newton

(iii) third operating conditions speed of 1500 rpm andload of 5000 Newton

Under the above operating conditions a total of 17 accel-erated life tests were realized on bearings of type NSK 6804DD which can operate at a maximum speed of 13000 rpmand a load limit of 4000N The tests were stopped whenthe amplitude of the vibration signal was higher than 20 gthus this moment was defined as the bearing failure time Anexample of bearing before and after the experiment is shownin Figure 7 together with the corresponding vibration signalcollected during the whole test

Table 4 reports how the 17 tested bearings were separatedinto the three operating conditions Moreover the durationof each experiment being the RUL to be predicted for eachbearing is also given We performed two sets of experimentsby considering respectively the bearings relative to thefirst and the second operating condition (ie Bearing1 1Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

As already mentioned the available data correspondto normally degraded bearings meaning that the defects

were not initially induced and that each degraded bearingcontains almost all the types of defects (balls rings and cage)resembling faithfully a common real industrial situationMoreover no assumption about the type of failure to beoccurred is provided with the data and since the variabilityin experiment durations is high (from 1 h to 7 h) performinggood estimates of the RUL is a difficult task [49]

In our experiments we considered as input to our modelthe horizontal channel of the accelerometerWe preprocessedthe raw signals by extracting two time-domain features thatis root mean square (RMS) and kurtosis within windowsof the same length as the given snapshots (119871 = 2560) Let119903119908(119905) be the raw signal of the 119908th window for each 119908 we

estimate RMS as 119909RMS119908

= radic(1119871)sum119871

119905=11199032119908(119905) and kurtosis as

119909KURT119908

= ((1119871)sum119871

119905=1(119903

119908(119905) minus 119903

119908)4)((1119871)sum

119871

119905=1(119903

119908(119905) minus 119903

119908)2)2

where 119903119908is the mean of 119903

119908 An example of feature extraction

for Bearing1 1 is shown in Figure 8To assess the performance of the proposed HSMM after

the model selection procedure we implemented a leave-one-out cross validation scheme by considering separatelyconditions 1 and 2 we performed the online RUL estimationfor each of the 7 bearings using an HSMM trained with theremaining 6 bearing histories Similarly to the simulated casewe considered the average absolute prediction error definedin (64) to quantitatively evaluate our method

522 Bearings RUL Estimation We performed our exper-iments in two steps firstly we applied model selection inorder to determine an optimalmodel structure and secondlywe estimated the RUL of the bearings The full procedure isdetailed in the following

(A) HSMM Structure To determine an appropriate HSMMstructure for effectively modeling the considered data weconsidered several HSMM structures characterized by (i)the duration distribution family (being Gaussian Gamma orWeibull) (ii) an increasing number of states119873 from 2 to 6and (iii) an increasing number of Gaussian mixtures 119872 inthe observation density from 1 to 4 For eachmodel structureobtained by systematically considering all the combinationsof (i) to (iii) we run 120 parameter learnings correspondingto 120 random initializations 1205820 on the data sets (Bearing1 1

Mathematical Problems in Engineering 17

times106

0 1 2 3 4 5 6 7

50

40

30

20

10

0

minus10

minus20

minus30

minus40

minus50

Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]

(b)(a)

0 280300

7

Extracted feature RMS

Time (s)

0 280300

60

Time (s)

Extracted feature kurtosis

Featureextraction

Win

dow

1

Win

dow

2

Win

dow

3

Win

dow

Win

dow

r(t)

x1 x2 x3

middot middot middot

middot middot middot

nminus1

n

xnminus1 xn

Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1

Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density

(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration

Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast

119894were estimated for the 119894th testing bear-

ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878

4as the failure state The same proce-

dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-

ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as

18 Mathematical Problems in Engineering

2 3 4 5 60

005

01

015

02

025

03

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(a) AIC values for Condition 1

2 3 4 5 60

01

02

03

04

05

06

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(b) AIC values for Condition 2

Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density

0 22590

22590

45000

Time (s)

True RULUpper RUL

Average RULLower RUL

(a) RUL estimation for Bearing1 7

True RULUpper RUL

Average RULLower RUL

0 70100

7010

19000

Time (s)

(b) RUL estimation for Bearing2 6

Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings

well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time

Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors

of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2

6 Conclusion and Future Work

In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation

The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required

Mathematical Problems in Engineering 19

Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds

(a) Condition 1

Test Bearings APEavg APElow APEup

Bearing1 1 105716 127230 94146Bearing1 2 43312 38156 38213Bearing1 3 29970 97309 60912Bearing1 4 63363 28766 148719Bearing1 5 19689 74484 104115Bearing1 6 42530 98964 97937Bearing1 7 13880 74943 100881Average 45494 77122 92132

(b) Condition 2

Test Bearings APEavg APElow APEup

Bearing2 1 24759 50065 72875Bearing2 2 16473 44972 82886Bearing2 3 88771 95083 79621Bearing2 4 17698 42486 49825Bearing2 5 86631 104900 107300Bearing2 6 8771 35047 66870Bearing2 7 30125 38664 66519Average 39033 58745 75128

parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology

This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications

As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension

Appendix

In this appendix we give the derivation of the state durationvariable introduced in (20) as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(119883

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A1)

The random variable 119889119905(119894) has been defined in Section 21

as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889

119905(119894) is sampled from an

arbitrary distribution

119889119905(119894) sim 119891 (119889) (A2)

We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as

P (119889119905(119894) = 119889) = P (119904

119905minus119889minus1= 119878

119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905 120582)

(A3)

We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889

119905(119894) of 119889

119905(119894) defined as its

expected value (see Equation (15))

119889119905(119894) = E (119889

119905(119894) | 119904

119905= 119878

119894 x

1x2sdot sdot sdot x

119905) 1 le 119894 le 119873 (A4)

From the definition of expectation we have

119889119905(119894) =

119905

sum

119889=1

119889 sdot P (119889119905(119894) = 119889)

=

119905

sum

119889=1

119889 sdot P (119904119905minus119889minus1

= 119878119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

(A5)

For 119889119905+1

(119894) we have

119889119905+1

(119894) =

119905+1

sum

119889=1

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119886)

(A6)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

(A7)

By noticing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

= (P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1))

sdot (P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1))

minus1

(A8)

20 Mathematical Problems in Engineering

we can replace the probability of the second termof (A7)with

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119887)

(A9)

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

(A10)

In the last factor of (A10) we can omit the information aboutthe current state and observation by observing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

asymp P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894| 119904

119905= 119878

119894 x

1 x

119905)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119888)

(A11)

if the following independencies hold

119904119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

119883119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

(A12)

wherewithperpwe denote independency Equation (A12) holdsforHMMs (evenwithout conditioning on x

1 x

119905) but they

do not hold for HSMMs since the state duration (expressedby 119904

119905minus119889+1 119904

119905minus1) determines the system evolution On

the other hand state duration is partially known by theobservtions x

1 x

119905Thus the approximation is reasonable

as long as the uncertainty on the states is within limitsFrom (A6) (A9) and (A11) we obtain

119889119905+1

(119894) = (119886) +

119905+1

sum

119889=2

119889 sdot (119887) sdot (119888)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

P(119860119861|119862)=P(119860|119861119862)sdotP(119861|119862)

+

119905+1

sum

119889=2

119889 sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

it does not depend on 119889

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

+ P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [

[

P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894119904

119905+1= 119878

119894 x

1 x

119905x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

for the approximation of (A11)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

]

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) +

119905

sum

1198891015840=1

(1198891015840+ 1)

sdot P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

(A13)

Noticing that

119905

sum

1198891015840=1

P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

+ P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) = 1

(A14)

because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows

119889119905+1

(119894) = P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1) sdot (119889

119905(119894) + 1)

(A15)

The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step

In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889

119905+1(119894)

we can consider the following equality

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=P (119904

119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

P (119904119905+1

= 119878119894| x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

120574119905+1

(119894)

(A16)

If we consider the terms involved in the probability at thenumerator of the right-hand side of (A16) we have that

x1 x

119905⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119861

perp x119905+1⏟⏟⏟⏟⏟⏟⏟

119862

| 119904119905= 119878

119894 119904

119905+1= 119878

119894⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119860

(A17)

Mathematical Problems in Engineering 21

If 119861 perp 119862 | 119860 for the Bayes rule we have that

P (119860 | 119862 119861) =P (119862 | 119860119861) sdot P (119860 | 119861)

P (119862 | 119861) (A18)

Hence we can rewrite the numerator of the right-hand sideof (A16) as follows

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

= (P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905)

sdot P(x119905+1

|

x119905+1

perp119904119905|119904119905+1

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞119904119905= 119878

119894 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

= (P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

sdot

120574119905(119894)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (119904

119905= 119878

119894| x

1 x

119905) sdot

119887119894(x119905+1

)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (x

119905+1| 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

(A19)

The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as

P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

= sum

119889119905

119886119894119894(d

119905) sdot P (119889

119905| x

1 x

119905)

asymp 119886119894119894(d

119905)

(A20)

while the denominator of (A19) can be expressed as follows

P (x119905+1

| x1 x

119905) =

P (x1 x

119905 x

119905+1)

P (x1 x

119905)

=sum

119873

119894=1120572119905+1

(119894)

sum119873

119894=1120572119905(119894)

(A21)

By substituting (A20) and (A21) in (A19) we obtain

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

sum119873

119894=1120572119905+1

(119894)

(A22)

and then by combining (A22) and (A16) we obtain

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

120574119905+1

(119894) sum119873

119894=1120572119905+1

(119894)

(A23)

Finally by substituting (A23) in (A15) and considering that

120574119905(119894) =

120572119905(119894)

sum119873

119894=1120572119905(119894)

(A24)

we derive the induction formula for 119889119905+1

(119894) in terms of modelparameters as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(x

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A25)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989

[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004

[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996

[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002

[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005

[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008

[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309

[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en

[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003

[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002

[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven

22 Mathematical Problems in Engineering

methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012

[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008

[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989

[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031

[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010

[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010

[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011

[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012

[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012

[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980

[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk

[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007

[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005

[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003

[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997

[26] S-Z Yu ldquoHidden semi-Markov modelsrdquo Artificial Intelligencevol 174 no 2 pp 215ndash243 2010

[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003

[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006

[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014

[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003

[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008

[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005

[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999

[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007

[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011

[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001

[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002

[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007

[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997

[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008

[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005

[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997

Mathematical Problems in Engineering 23

[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002

[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986

[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993

[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006

[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973

[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977

[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012

[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985

[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015

[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013

[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013

[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013

[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013

[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012

[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012

[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model

and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013

[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 14: Hidden Semi-Markov Models for Predictive Maintenance

14 Mathematical Problems in Engineering

0 50 100 150 200 250 300 3500

50

100

150

200

250

300

350

400

Time

True RULUpper RUL

Average RULLower RUL

(a) Remaining Useful Lifetime estimation for continuous data andWeibull duration distribution

0 50 100 150 200 250 300 3500

50

100

150

200

250

300

350

400

Time

True RULUpper RUL

Average RULLower RUL

(b) Remaining Useful Lifetime estimation for discrete data andGamma duration distribution

Figure 5 HSMMs effectively solve RUL estimation problemsThe prediction converges to the actual RUL value and its uncertainty decreasesas the real failure time approaches

Table 1 State recognition accuracy

(a) Continuous observations

Test case Duration distributionGaussian Gamma Weibull

1 994 985 9922 997 986 9953 994 992 9974 989 989 9975 982 989 1006 991 988 9977 985 994 9978 992 991 9959 992 986 99710 992 991 995Average 991 989 996

(b) Discrete observations

Test case Duration distributionGaussian Gamma Weibull

1 974 967 9742 972 976 9653 994 958 9664 982 953 9775 991 974 9756 978 977 9787 958 972 9668 977 964 9729 989 972 98510 992 956 969Average 981 967 973

is mainly due to the proposed average state duration of (20)compared to the one of Azimi given by (16)

52 Real Data In this sectionwe apply the proposedHSMM-based approach for RUL estimation to a real case studyusing bearing monitoring data recorded during experimentscarried out on the Pronostia experimental platform andmadeavailable for the IEEE Prognostics and Health Management(PHM) 2012 challenge [49]The data correspond to normallydegraded bearings leading to cases which closely correspondto the industrial reality

The choice of testing the proposed methodology onbearings derives from two facts (i) bearings are the mostcritical components related to failures of rotating machines[50] and (ii) their monotonically increasing degradationpattern justifies the usage of left-right HSMMmodels

521 Data Description Pronostia is an experimental plat-form designed and realized at the Automatic Controland Micro-Mechatronic Systems (AS2M) Department ofFranche-Comte Electronics Mechanics Thermal Process-ing Optics-Science and Technology (FEMTO-ST) Institute(httpwwwfemto-stfr) (Besancon France) with the aimof collecting real data related to accelerated degradation ofbearings Such data are used to validate methods for bearingcondition assessment diagnostic and prognostic [19 51ndash59]

The Pronostia platform allows to perform run-to-failuretests under constant or variable operating conditions Theoperating conditions are determined by two factors thatcan be controlled online (i) the rotation speed and (ii)the radial force load During each experiment tempera-ture and vibration monitoring measurements are gatheredonline through two type of sensors placed in the bearing

Mathematical Problems in Engineering 15

Table 2 Average absolute prediction error (APE) of the RUL esti-mation using the proposed state duration estimator of (20)

(a) APE of the RUL estimation for the continuous observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 51 170 67 140 290 091 45 170 812 76 190 50 61 210 85 66 190 613 77 54 190 29 120 170 160 290 304 90 210 29 75 220 68 60 190 675 73 190 47 22 140 140 39 170 876 65 180 56 51 180 100 140 270 277 47 160 75 48 170 110 12 130 1208 100 220 29 52 180 100 92 220 399 31 92 140 20 160 130 82 210 4910 64 180 56 75 220 69 33 120 130Average 68 170 74 57 190 99 73 200 70

(b) APE of the RUL estimation for the discrete observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 21 110 140 31 88 140 24 120 1302 21 110 130 110 220 33 190 320 713 51 170 76 66 180 51 23 140 1104 59 65 180 52 170 67 42 160 905 32 140 100 83 190 34 120 240 296 120 240 27 62 180 52 41 84 1607 29 150 97 93 210 23 190 310 668 150 270 70 74 180 43 43 170 949 59 180 77 110 230 55 39 160 8810 35 110 140 55 60 160 52 170 71Average 57 150 100 74 170 66 77 190 90

housing a temperature probe and two accelerometers (oneon the vertical and one on the horizontal axis)

The platform is composed of three main parts a rotatingpart a load profile generation part and a measurement partas illustrated in Figure 6

The rotating part is composed of an asynchronous motorwhich develops a power equal to 250W two shafts and agearbox which allows the motor to reach its rated speed of2830 rpm The motorrsquos rotation speed and the direction areset through a human machine interface

The load profiles part issues a radial force on the externalring of the test bearing through a pneumatic jack connectedto a lever arm which indirectly transmits the load througha clamping ring The goal of the applied radial force is toaccelerate the bearingrsquos degradation process

Table 3 Average absolute prediction error (APE) of the RUL esti-mation using the state duration estimator of (16) introduced byAzimi et al [30ndash32]

(a) APE of the RUL estimation for the continuous observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 578 510 668 262 97 527 259 284 6462 502 444 577 213 170 469 290 192 7083 503 447 573 271 87 565 345 139 7344 518 460 604 213 143 459 349 171 7875 594 537 662 290 95 554 334 156 7496 580 517 671 258 83 541 231 258 6657 594 536 669 182 125 477 360 171 7448 634 556 723 194 157 441 348 178 7709 491 435 570 145 171 432 251 267 67010 544 484 628 232 79 527 241 245 674Average 554 493 635 226 121 499 301 206 715

(b) APE of the RUL estimation for the discrete observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 514 410 624 424 318 530 326 264 7362 496 399 604 595 483 708 313 276 6933 502 386 623 465 357 574 324 257 7024 422 315 538 501 405 606 237 361 6035 443 339 558 478 374 591 360 256 7656 522 432 627 552 443 669 272 316 6437 550 439 668 560 457 670 347 232 7448 503 390 620 604 505 710 351 264 7249 555 474 640 480 372 595 318 222 73610 490 382 607 521 412 631 294 289 687Average 500 397 611 518 413 629 314 274 704

Themeasurement part consists of a data acquisition cardconnected to the monitoring sensors which provides theuser with the measured temperature and vibration data Thevibration measurements are provided in snapshots of 01 scollected each 10 seconds at a sampling frequency of 256 kHz(2560 samples per each snapshot) while the temperature hasbeen continuously recorded at a sampling frequency of 10Hz(600 samples collected each minute)

Further details on the Pronostia test rig can be found onthe data presentation paper [49] and on the web page of thedata challenge (httpwwwfemto-stfrenResearch-depart-mentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-challengephp)

16 Mathematical Problems in Engineering

Table 4 Lifetime duration (in seconds) and operating conditions of the bearings tested in the IEEE PHM 2012 Prognostic Challenge dataset[49]

Condition 1 Condition 2 Condition 31800 rpm and 4000N 1650 rpm and 4200N 1500 rpm and 5000N

Bearing Lifetime [s] Bearing Lifetime [s] Bearing Lifetime [s]Bearing1 1 28030 Bearing2 1 9110 Bearing3 1 5150Bearing1 2 8710 Bearing2 2 7970 Bearing3 2 16370Bearing1 3 23750 Bearing2 3 19550 Bearing3 3 4340Bearing1 4 14280 Bearing2 4 7510Bearing1 5 24630 Bearing2 5 23110Bearing1 6 24480 Bearing2 6 7010Bearing1 7 22590 Bearing2 7 2300

Loadmodule

Testedbearing

Dataacquisitionmodule

Rotatingmodule

Figure 6 Global overview of the Pronostia experimental platform[19]

Regarding the data provided for the PHM 2012 challenge3 different operating conditions were considered

(i) first operating conditions speed of 1800 rpm and loadof 4000 Newton

(ii) second operating conditions speed of 1650 rpm andload of 4200 Newton

(iii) third operating conditions speed of 1500 rpm andload of 5000 Newton

Under the above operating conditions a total of 17 accel-erated life tests were realized on bearings of type NSK 6804DD which can operate at a maximum speed of 13000 rpmand a load limit of 4000N The tests were stopped whenthe amplitude of the vibration signal was higher than 20 gthus this moment was defined as the bearing failure time Anexample of bearing before and after the experiment is shownin Figure 7 together with the corresponding vibration signalcollected during the whole test

Table 4 reports how the 17 tested bearings were separatedinto the three operating conditions Moreover the durationof each experiment being the RUL to be predicted for eachbearing is also given We performed two sets of experimentsby considering respectively the bearings relative to thefirst and the second operating condition (ie Bearing1 1Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

As already mentioned the available data correspondto normally degraded bearings meaning that the defects

were not initially induced and that each degraded bearingcontains almost all the types of defects (balls rings and cage)resembling faithfully a common real industrial situationMoreover no assumption about the type of failure to beoccurred is provided with the data and since the variabilityin experiment durations is high (from 1 h to 7 h) performinggood estimates of the RUL is a difficult task [49]

In our experiments we considered as input to our modelthe horizontal channel of the accelerometerWe preprocessedthe raw signals by extracting two time-domain features thatis root mean square (RMS) and kurtosis within windowsof the same length as the given snapshots (119871 = 2560) Let119903119908(119905) be the raw signal of the 119908th window for each 119908 we

estimate RMS as 119909RMS119908

= radic(1119871)sum119871

119905=11199032119908(119905) and kurtosis as

119909KURT119908

= ((1119871)sum119871

119905=1(119903

119908(119905) minus 119903

119908)4)((1119871)sum

119871

119905=1(119903

119908(119905) minus 119903

119908)2)2

where 119903119908is the mean of 119903

119908 An example of feature extraction

for Bearing1 1 is shown in Figure 8To assess the performance of the proposed HSMM after

the model selection procedure we implemented a leave-one-out cross validation scheme by considering separatelyconditions 1 and 2 we performed the online RUL estimationfor each of the 7 bearings using an HSMM trained with theremaining 6 bearing histories Similarly to the simulated casewe considered the average absolute prediction error definedin (64) to quantitatively evaluate our method

522 Bearings RUL Estimation We performed our exper-iments in two steps firstly we applied model selection inorder to determine an optimalmodel structure and secondlywe estimated the RUL of the bearings The full procedure isdetailed in the following

(A) HSMM Structure To determine an appropriate HSMMstructure for effectively modeling the considered data weconsidered several HSMM structures characterized by (i)the duration distribution family (being Gaussian Gamma orWeibull) (ii) an increasing number of states119873 from 2 to 6and (iii) an increasing number of Gaussian mixtures 119872 inthe observation density from 1 to 4 For eachmodel structureobtained by systematically considering all the combinationsof (i) to (iii) we run 120 parameter learnings correspondingto 120 random initializations 1205820 on the data sets (Bearing1 1

Mathematical Problems in Engineering 17

times106

0 1 2 3 4 5 6 7

50

40

30

20

10

0

minus10

minus20

minus30

minus40

minus50

Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]

(b)(a)

0 280300

7

Extracted feature RMS

Time (s)

0 280300

60

Time (s)

Extracted feature kurtosis

Featureextraction

Win

dow

1

Win

dow

2

Win

dow

3

Win

dow

Win

dow

r(t)

x1 x2 x3

middot middot middot

middot middot middot

nminus1

n

xnminus1 xn

Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1

Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density

(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration

Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast

119894were estimated for the 119894th testing bear-

ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878

4as the failure state The same proce-

dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-

ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as

18 Mathematical Problems in Engineering

2 3 4 5 60

005

01

015

02

025

03

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(a) AIC values for Condition 1

2 3 4 5 60

01

02

03

04

05

06

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(b) AIC values for Condition 2

Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density

0 22590

22590

45000

Time (s)

True RULUpper RUL

Average RULLower RUL

(a) RUL estimation for Bearing1 7

True RULUpper RUL

Average RULLower RUL

0 70100

7010

19000

Time (s)

(b) RUL estimation for Bearing2 6

Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings

well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time

Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors

of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2

6 Conclusion and Future Work

In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation

The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required

Mathematical Problems in Engineering 19

Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds

(a) Condition 1

Test Bearings APEavg APElow APEup

Bearing1 1 105716 127230 94146Bearing1 2 43312 38156 38213Bearing1 3 29970 97309 60912Bearing1 4 63363 28766 148719Bearing1 5 19689 74484 104115Bearing1 6 42530 98964 97937Bearing1 7 13880 74943 100881Average 45494 77122 92132

(b) Condition 2

Test Bearings APEavg APElow APEup

Bearing2 1 24759 50065 72875Bearing2 2 16473 44972 82886Bearing2 3 88771 95083 79621Bearing2 4 17698 42486 49825Bearing2 5 86631 104900 107300Bearing2 6 8771 35047 66870Bearing2 7 30125 38664 66519Average 39033 58745 75128

parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology

This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications

As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension

Appendix

In this appendix we give the derivation of the state durationvariable introduced in (20) as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(119883

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A1)

The random variable 119889119905(119894) has been defined in Section 21

as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889

119905(119894) is sampled from an

arbitrary distribution

119889119905(119894) sim 119891 (119889) (A2)

We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as

P (119889119905(119894) = 119889) = P (119904

119905minus119889minus1= 119878

119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905 120582)

(A3)

We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889

119905(119894) of 119889

119905(119894) defined as its

expected value (see Equation (15))

119889119905(119894) = E (119889

119905(119894) | 119904

119905= 119878

119894 x

1x2sdot sdot sdot x

119905) 1 le 119894 le 119873 (A4)

From the definition of expectation we have

119889119905(119894) =

119905

sum

119889=1

119889 sdot P (119889119905(119894) = 119889)

=

119905

sum

119889=1

119889 sdot P (119904119905minus119889minus1

= 119878119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

(A5)

For 119889119905+1

(119894) we have

119889119905+1

(119894) =

119905+1

sum

119889=1

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119886)

(A6)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

(A7)

By noticing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

= (P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1))

sdot (P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1))

minus1

(A8)

20 Mathematical Problems in Engineering

we can replace the probability of the second termof (A7)with

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119887)

(A9)

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

(A10)

In the last factor of (A10) we can omit the information aboutthe current state and observation by observing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

asymp P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894| 119904

119905= 119878

119894 x

1 x

119905)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119888)

(A11)

if the following independencies hold

119904119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

119883119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

(A12)

wherewithperpwe denote independency Equation (A12) holdsforHMMs (evenwithout conditioning on x

1 x

119905) but they

do not hold for HSMMs since the state duration (expressedby 119904

119905minus119889+1 119904

119905minus1) determines the system evolution On

the other hand state duration is partially known by theobservtions x

1 x

119905Thus the approximation is reasonable

as long as the uncertainty on the states is within limitsFrom (A6) (A9) and (A11) we obtain

119889119905+1

(119894) = (119886) +

119905+1

sum

119889=2

119889 sdot (119887) sdot (119888)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

P(119860119861|119862)=P(119860|119861119862)sdotP(119861|119862)

+

119905+1

sum

119889=2

119889 sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

it does not depend on 119889

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

+ P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [

[

P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894119904

119905+1= 119878

119894 x

1 x

119905x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

for the approximation of (A11)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

]

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) +

119905

sum

1198891015840=1

(1198891015840+ 1)

sdot P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

(A13)

Noticing that

119905

sum

1198891015840=1

P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

+ P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) = 1

(A14)

because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows

119889119905+1

(119894) = P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1) sdot (119889

119905(119894) + 1)

(A15)

The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step

In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889

119905+1(119894)

we can consider the following equality

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=P (119904

119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

P (119904119905+1

= 119878119894| x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

120574119905+1

(119894)

(A16)

If we consider the terms involved in the probability at thenumerator of the right-hand side of (A16) we have that

x1 x

119905⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119861

perp x119905+1⏟⏟⏟⏟⏟⏟⏟

119862

| 119904119905= 119878

119894 119904

119905+1= 119878

119894⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119860

(A17)

Mathematical Problems in Engineering 21

If 119861 perp 119862 | 119860 for the Bayes rule we have that

P (119860 | 119862 119861) =P (119862 | 119860119861) sdot P (119860 | 119861)

P (119862 | 119861) (A18)

Hence we can rewrite the numerator of the right-hand sideof (A16) as follows

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

= (P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905)

sdot P(x119905+1

|

x119905+1

perp119904119905|119904119905+1

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞119904119905= 119878

119894 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

= (P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

sdot

120574119905(119894)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (119904

119905= 119878

119894| x

1 x

119905) sdot

119887119894(x119905+1

)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (x

119905+1| 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

(A19)

The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as

P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

= sum

119889119905

119886119894119894(d

119905) sdot P (119889

119905| x

1 x

119905)

asymp 119886119894119894(d

119905)

(A20)

while the denominator of (A19) can be expressed as follows

P (x119905+1

| x1 x

119905) =

P (x1 x

119905 x

119905+1)

P (x1 x

119905)

=sum

119873

119894=1120572119905+1

(119894)

sum119873

119894=1120572119905(119894)

(A21)

By substituting (A20) and (A21) in (A19) we obtain

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

sum119873

119894=1120572119905+1

(119894)

(A22)

and then by combining (A22) and (A16) we obtain

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

120574119905+1

(119894) sum119873

119894=1120572119905+1

(119894)

(A23)

Finally by substituting (A23) in (A15) and considering that

120574119905(119894) =

120572119905(119894)

sum119873

119894=1120572119905(119894)

(A24)

we derive the induction formula for 119889119905+1

(119894) in terms of modelparameters as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(x

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A25)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989

[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004

[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996

[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002

[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005

[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008

[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309

[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en

[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003

[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002

[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven

22 Mathematical Problems in Engineering

methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012

[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008

[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989

[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031

[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010

[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010

[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011

[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012

[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012

[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980

[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk

[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007

[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005

[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003

[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997

[26] S-Z Yu ldquoHidden semi-Markov modelsrdquo Artificial Intelligencevol 174 no 2 pp 215ndash243 2010

[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003

[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006

[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014

[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003

[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008

[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005

[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999

[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007

[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011

[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001

[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002

[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007

[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997

[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008

[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005

[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997

Mathematical Problems in Engineering 23

[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002

[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986

[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993

[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006

[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973

[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977

[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012

[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985

[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015

[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013

[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013

[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013

[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013

[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012

[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012

[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model

and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013

[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 15: Hidden Semi-Markov Models for Predictive Maintenance

Mathematical Problems in Engineering 15

Table 2 Average absolute prediction error (APE) of the RUL esti-mation using the proposed state duration estimator of (20)

(a) APE of the RUL estimation for the continuous observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 51 170 67 140 290 091 45 170 812 76 190 50 61 210 85 66 190 613 77 54 190 29 120 170 160 290 304 90 210 29 75 220 68 60 190 675 73 190 47 22 140 140 39 170 876 65 180 56 51 180 100 140 270 277 47 160 75 48 170 110 12 130 1208 100 220 29 52 180 100 92 220 399 31 92 140 20 160 130 82 210 4910 64 180 56 75 220 69 33 120 130Average 68 170 74 57 190 99 73 200 70

(b) APE of the RUL estimation for the discrete observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 21 110 140 31 88 140 24 120 1302 21 110 130 110 220 33 190 320 713 51 170 76 66 180 51 23 140 1104 59 65 180 52 170 67 42 160 905 32 140 100 83 190 34 120 240 296 120 240 27 62 180 52 41 84 1607 29 150 97 93 210 23 190 310 668 150 270 70 74 180 43 43 170 949 59 180 77 110 230 55 39 160 8810 35 110 140 55 60 160 52 170 71Average 57 150 100 74 170 66 77 190 90

housing a temperature probe and two accelerometers (oneon the vertical and one on the horizontal axis)

The platform is composed of three main parts a rotatingpart a load profile generation part and a measurement partas illustrated in Figure 6

The rotating part is composed of an asynchronous motorwhich develops a power equal to 250W two shafts and agearbox which allows the motor to reach its rated speed of2830 rpm The motorrsquos rotation speed and the direction areset through a human machine interface

The load profiles part issues a radial force on the externalring of the test bearing through a pneumatic jack connectedto a lever arm which indirectly transmits the load througha clamping ring The goal of the applied radial force is toaccelerate the bearingrsquos degradation process

Table 3 Average absolute prediction error (APE) of the RUL esti-mation using the state duration estimator of (16) introduced byAzimi et al [30ndash32]

(a) APE of the RUL estimation for the continuous observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 578 510 668 262 97 527 259 284 6462 502 444 577 213 170 469 290 192 7083 503 447 573 271 87 565 345 139 7344 518 460 604 213 143 459 349 171 7875 594 537 662 290 95 554 334 156 7496 580 517 671 258 83 541 231 258 6657 594 536 669 182 125 477 360 171 7448 634 556 723 194 157 441 348 178 7709 491 435 570 145 171 432 251 267 67010 544 484 628 232 79 527 241 245 674Average 554 493 635 226 121 499 301 206 715

(b) APE of the RUL estimation for the discrete observation test cases

Testcase

Duration distributionGaussian Gamma Weibull

APE APE APE APE APE APE APE APE APEavg up low avg up low avg up low

1 514 410 624 424 318 530 326 264 7362 496 399 604 595 483 708 313 276 6933 502 386 623 465 357 574 324 257 7024 422 315 538 501 405 606 237 361 6035 443 339 558 478 374 591 360 256 7656 522 432 627 552 443 669 272 316 6437 550 439 668 560 457 670 347 232 7448 503 390 620 604 505 710 351 264 7249 555 474 640 480 372 595 318 222 73610 490 382 607 521 412 631 294 289 687Average 500 397 611 518 413 629 314 274 704

Themeasurement part consists of a data acquisition cardconnected to the monitoring sensors which provides theuser with the measured temperature and vibration data Thevibration measurements are provided in snapshots of 01 scollected each 10 seconds at a sampling frequency of 256 kHz(2560 samples per each snapshot) while the temperature hasbeen continuously recorded at a sampling frequency of 10Hz(600 samples collected each minute)

Further details on the Pronostia test rig can be found onthe data presentation paper [49] and on the web page of thedata challenge (httpwwwfemto-stfrenResearch-depart-mentsAS2MResearch-groupsPHMIEEE-PHM-2012-Data-challengephp)

16 Mathematical Problems in Engineering

Table 4 Lifetime duration (in seconds) and operating conditions of the bearings tested in the IEEE PHM 2012 Prognostic Challenge dataset[49]

Condition 1 Condition 2 Condition 31800 rpm and 4000N 1650 rpm and 4200N 1500 rpm and 5000N

Bearing Lifetime [s] Bearing Lifetime [s] Bearing Lifetime [s]Bearing1 1 28030 Bearing2 1 9110 Bearing3 1 5150Bearing1 2 8710 Bearing2 2 7970 Bearing3 2 16370Bearing1 3 23750 Bearing2 3 19550 Bearing3 3 4340Bearing1 4 14280 Bearing2 4 7510Bearing1 5 24630 Bearing2 5 23110Bearing1 6 24480 Bearing2 6 7010Bearing1 7 22590 Bearing2 7 2300

Loadmodule

Testedbearing

Dataacquisitionmodule

Rotatingmodule

Figure 6 Global overview of the Pronostia experimental platform[19]

Regarding the data provided for the PHM 2012 challenge3 different operating conditions were considered

(i) first operating conditions speed of 1800 rpm and loadof 4000 Newton

(ii) second operating conditions speed of 1650 rpm andload of 4200 Newton

(iii) third operating conditions speed of 1500 rpm andload of 5000 Newton

Under the above operating conditions a total of 17 accel-erated life tests were realized on bearings of type NSK 6804DD which can operate at a maximum speed of 13000 rpmand a load limit of 4000N The tests were stopped whenthe amplitude of the vibration signal was higher than 20 gthus this moment was defined as the bearing failure time Anexample of bearing before and after the experiment is shownin Figure 7 together with the corresponding vibration signalcollected during the whole test

Table 4 reports how the 17 tested bearings were separatedinto the three operating conditions Moreover the durationof each experiment being the RUL to be predicted for eachbearing is also given We performed two sets of experimentsby considering respectively the bearings relative to thefirst and the second operating condition (ie Bearing1 1Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

As already mentioned the available data correspondto normally degraded bearings meaning that the defects

were not initially induced and that each degraded bearingcontains almost all the types of defects (balls rings and cage)resembling faithfully a common real industrial situationMoreover no assumption about the type of failure to beoccurred is provided with the data and since the variabilityin experiment durations is high (from 1 h to 7 h) performinggood estimates of the RUL is a difficult task [49]

In our experiments we considered as input to our modelthe horizontal channel of the accelerometerWe preprocessedthe raw signals by extracting two time-domain features thatis root mean square (RMS) and kurtosis within windowsof the same length as the given snapshots (119871 = 2560) Let119903119908(119905) be the raw signal of the 119908th window for each 119908 we

estimate RMS as 119909RMS119908

= radic(1119871)sum119871

119905=11199032119908(119905) and kurtosis as

119909KURT119908

= ((1119871)sum119871

119905=1(119903

119908(119905) minus 119903

119908)4)((1119871)sum

119871

119905=1(119903

119908(119905) minus 119903

119908)2)2

where 119903119908is the mean of 119903

119908 An example of feature extraction

for Bearing1 1 is shown in Figure 8To assess the performance of the proposed HSMM after

the model selection procedure we implemented a leave-one-out cross validation scheme by considering separatelyconditions 1 and 2 we performed the online RUL estimationfor each of the 7 bearings using an HSMM trained with theremaining 6 bearing histories Similarly to the simulated casewe considered the average absolute prediction error definedin (64) to quantitatively evaluate our method

522 Bearings RUL Estimation We performed our exper-iments in two steps firstly we applied model selection inorder to determine an optimalmodel structure and secondlywe estimated the RUL of the bearings The full procedure isdetailed in the following

(A) HSMM Structure To determine an appropriate HSMMstructure for effectively modeling the considered data weconsidered several HSMM structures characterized by (i)the duration distribution family (being Gaussian Gamma orWeibull) (ii) an increasing number of states119873 from 2 to 6and (iii) an increasing number of Gaussian mixtures 119872 inthe observation density from 1 to 4 For eachmodel structureobtained by systematically considering all the combinationsof (i) to (iii) we run 120 parameter learnings correspondingto 120 random initializations 1205820 on the data sets (Bearing1 1

Mathematical Problems in Engineering 17

times106

0 1 2 3 4 5 6 7

50

40

30

20

10

0

minus10

minus20

minus30

minus40

minus50

Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]

(b)(a)

0 280300

7

Extracted feature RMS

Time (s)

0 280300

60

Time (s)

Extracted feature kurtosis

Featureextraction

Win

dow

1

Win

dow

2

Win

dow

3

Win

dow

Win

dow

r(t)

x1 x2 x3

middot middot middot

middot middot middot

nminus1

n

xnminus1 xn

Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1

Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density

(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration

Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast

119894were estimated for the 119894th testing bear-

ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878

4as the failure state The same proce-

dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-

ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as

18 Mathematical Problems in Engineering

2 3 4 5 60

005

01

015

02

025

03

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(a) AIC values for Condition 1

2 3 4 5 60

01

02

03

04

05

06

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(b) AIC values for Condition 2

Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density

0 22590

22590

45000

Time (s)

True RULUpper RUL

Average RULLower RUL

(a) RUL estimation for Bearing1 7

True RULUpper RUL

Average RULLower RUL

0 70100

7010

19000

Time (s)

(b) RUL estimation for Bearing2 6

Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings

well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time

Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors

of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2

6 Conclusion and Future Work

In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation

The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required

Mathematical Problems in Engineering 19

Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds

(a) Condition 1

Test Bearings APEavg APElow APEup

Bearing1 1 105716 127230 94146Bearing1 2 43312 38156 38213Bearing1 3 29970 97309 60912Bearing1 4 63363 28766 148719Bearing1 5 19689 74484 104115Bearing1 6 42530 98964 97937Bearing1 7 13880 74943 100881Average 45494 77122 92132

(b) Condition 2

Test Bearings APEavg APElow APEup

Bearing2 1 24759 50065 72875Bearing2 2 16473 44972 82886Bearing2 3 88771 95083 79621Bearing2 4 17698 42486 49825Bearing2 5 86631 104900 107300Bearing2 6 8771 35047 66870Bearing2 7 30125 38664 66519Average 39033 58745 75128

parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology

This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications

As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension

Appendix

In this appendix we give the derivation of the state durationvariable introduced in (20) as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(119883

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A1)

The random variable 119889119905(119894) has been defined in Section 21

as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889

119905(119894) is sampled from an

arbitrary distribution

119889119905(119894) sim 119891 (119889) (A2)

We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as

P (119889119905(119894) = 119889) = P (119904

119905minus119889minus1= 119878

119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905 120582)

(A3)

We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889

119905(119894) of 119889

119905(119894) defined as its

expected value (see Equation (15))

119889119905(119894) = E (119889

119905(119894) | 119904

119905= 119878

119894 x

1x2sdot sdot sdot x

119905) 1 le 119894 le 119873 (A4)

From the definition of expectation we have

119889119905(119894) =

119905

sum

119889=1

119889 sdot P (119889119905(119894) = 119889)

=

119905

sum

119889=1

119889 sdot P (119904119905minus119889minus1

= 119878119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

(A5)

For 119889119905+1

(119894) we have

119889119905+1

(119894) =

119905+1

sum

119889=1

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119886)

(A6)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

(A7)

By noticing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

= (P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1))

sdot (P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1))

minus1

(A8)

20 Mathematical Problems in Engineering

we can replace the probability of the second termof (A7)with

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119887)

(A9)

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

(A10)

In the last factor of (A10) we can omit the information aboutthe current state and observation by observing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

asymp P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894| 119904

119905= 119878

119894 x

1 x

119905)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119888)

(A11)

if the following independencies hold

119904119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

119883119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

(A12)

wherewithperpwe denote independency Equation (A12) holdsforHMMs (evenwithout conditioning on x

1 x

119905) but they

do not hold for HSMMs since the state duration (expressedby 119904

119905minus119889+1 119904

119905minus1) determines the system evolution On

the other hand state duration is partially known by theobservtions x

1 x

119905Thus the approximation is reasonable

as long as the uncertainty on the states is within limitsFrom (A6) (A9) and (A11) we obtain

119889119905+1

(119894) = (119886) +

119905+1

sum

119889=2

119889 sdot (119887) sdot (119888)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

P(119860119861|119862)=P(119860|119861119862)sdotP(119861|119862)

+

119905+1

sum

119889=2

119889 sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

it does not depend on 119889

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

+ P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [

[

P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894119904

119905+1= 119878

119894 x

1 x

119905x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

for the approximation of (A11)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

]

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) +

119905

sum

1198891015840=1

(1198891015840+ 1)

sdot P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

(A13)

Noticing that

119905

sum

1198891015840=1

P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

+ P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) = 1

(A14)

because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows

119889119905+1

(119894) = P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1) sdot (119889

119905(119894) + 1)

(A15)

The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step

In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889

119905+1(119894)

we can consider the following equality

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=P (119904

119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

P (119904119905+1

= 119878119894| x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

120574119905+1

(119894)

(A16)

If we consider the terms involved in the probability at thenumerator of the right-hand side of (A16) we have that

x1 x

119905⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119861

perp x119905+1⏟⏟⏟⏟⏟⏟⏟

119862

| 119904119905= 119878

119894 119904

119905+1= 119878

119894⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119860

(A17)

Mathematical Problems in Engineering 21

If 119861 perp 119862 | 119860 for the Bayes rule we have that

P (119860 | 119862 119861) =P (119862 | 119860119861) sdot P (119860 | 119861)

P (119862 | 119861) (A18)

Hence we can rewrite the numerator of the right-hand sideof (A16) as follows

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

= (P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905)

sdot P(x119905+1

|

x119905+1

perp119904119905|119904119905+1

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞119904119905= 119878

119894 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

= (P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

sdot

120574119905(119894)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (119904

119905= 119878

119894| x

1 x

119905) sdot

119887119894(x119905+1

)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (x

119905+1| 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

(A19)

The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as

P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

= sum

119889119905

119886119894119894(d

119905) sdot P (119889

119905| x

1 x

119905)

asymp 119886119894119894(d

119905)

(A20)

while the denominator of (A19) can be expressed as follows

P (x119905+1

| x1 x

119905) =

P (x1 x

119905 x

119905+1)

P (x1 x

119905)

=sum

119873

119894=1120572119905+1

(119894)

sum119873

119894=1120572119905(119894)

(A21)

By substituting (A20) and (A21) in (A19) we obtain

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

sum119873

119894=1120572119905+1

(119894)

(A22)

and then by combining (A22) and (A16) we obtain

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

120574119905+1

(119894) sum119873

119894=1120572119905+1

(119894)

(A23)

Finally by substituting (A23) in (A15) and considering that

120574119905(119894) =

120572119905(119894)

sum119873

119894=1120572119905(119894)

(A24)

we derive the induction formula for 119889119905+1

(119894) in terms of modelparameters as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(x

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A25)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989

[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004

[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996

[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002

[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005

[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008

[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309

[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en

[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003

[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002

[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven

22 Mathematical Problems in Engineering

methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012

[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008

[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989

[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031

[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010

[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010

[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011

[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012

[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012

[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980

[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk

[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007

[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005

[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003

[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997

[26] S-Z Yu ldquoHidden semi-Markov modelsrdquo Artificial Intelligencevol 174 no 2 pp 215ndash243 2010

[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003

[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006

[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014

[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003

[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008

[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005

[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999

[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007

[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011

[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001

[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002

[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007

[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997

[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008

[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005

[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997

Mathematical Problems in Engineering 23

[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002

[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986

[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993

[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006

[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973

[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977

[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012

[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985

[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015

[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013

[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013

[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013

[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013

[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012

[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012

[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model

and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013

[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 16: Hidden Semi-Markov Models for Predictive Maintenance

16 Mathematical Problems in Engineering

Table 4 Lifetime duration (in seconds) and operating conditions of the bearings tested in the IEEE PHM 2012 Prognostic Challenge dataset[49]

Condition 1 Condition 2 Condition 31800 rpm and 4000N 1650 rpm and 4200N 1500 rpm and 5000N

Bearing Lifetime [s] Bearing Lifetime [s] Bearing Lifetime [s]Bearing1 1 28030 Bearing2 1 9110 Bearing3 1 5150Bearing1 2 8710 Bearing2 2 7970 Bearing3 2 16370Bearing1 3 23750 Bearing2 3 19550 Bearing3 3 4340Bearing1 4 14280 Bearing2 4 7510Bearing1 5 24630 Bearing2 5 23110Bearing1 6 24480 Bearing2 6 7010Bearing1 7 22590 Bearing2 7 2300

Loadmodule

Testedbearing

Dataacquisitionmodule

Rotatingmodule

Figure 6 Global overview of the Pronostia experimental platform[19]

Regarding the data provided for the PHM 2012 challenge3 different operating conditions were considered

(i) first operating conditions speed of 1800 rpm and loadof 4000 Newton

(ii) second operating conditions speed of 1650 rpm andload of 4200 Newton

(iii) third operating conditions speed of 1500 rpm andload of 5000 Newton

Under the above operating conditions a total of 17 accel-erated life tests were realized on bearings of type NSK 6804DD which can operate at a maximum speed of 13000 rpmand a load limit of 4000N The tests were stopped whenthe amplitude of the vibration signal was higher than 20 gthus this moment was defined as the bearing failure time Anexample of bearing before and after the experiment is shownin Figure 7 together with the corresponding vibration signalcollected during the whole test

Table 4 reports how the 17 tested bearings were separatedinto the three operating conditions Moreover the durationof each experiment being the RUL to be predicted for eachbearing is also given We performed two sets of experimentsby considering respectively the bearings relative to thefirst and the second operating condition (ie Bearing1 1Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

As already mentioned the available data correspondto normally degraded bearings meaning that the defects

were not initially induced and that each degraded bearingcontains almost all the types of defects (balls rings and cage)resembling faithfully a common real industrial situationMoreover no assumption about the type of failure to beoccurred is provided with the data and since the variabilityin experiment durations is high (from 1 h to 7 h) performinggood estimates of the RUL is a difficult task [49]

In our experiments we considered as input to our modelthe horizontal channel of the accelerometerWe preprocessedthe raw signals by extracting two time-domain features thatis root mean square (RMS) and kurtosis within windowsof the same length as the given snapshots (119871 = 2560) Let119903119908(119905) be the raw signal of the 119908th window for each 119908 we

estimate RMS as 119909RMS119908

= radic(1119871)sum119871

119905=11199032119908(119905) and kurtosis as

119909KURT119908

= ((1119871)sum119871

119905=1(119903

119908(119905) minus 119903

119908)4)((1119871)sum

119871

119905=1(119903

119908(119905) minus 119903

119908)2)2

where 119903119908is the mean of 119903

119908 An example of feature extraction

for Bearing1 1 is shown in Figure 8To assess the performance of the proposed HSMM after

the model selection procedure we implemented a leave-one-out cross validation scheme by considering separatelyconditions 1 and 2 we performed the online RUL estimationfor each of the 7 bearings using an HSMM trained with theremaining 6 bearing histories Similarly to the simulated casewe considered the average absolute prediction error definedin (64) to quantitatively evaluate our method

522 Bearings RUL Estimation We performed our exper-iments in two steps firstly we applied model selection inorder to determine an optimalmodel structure and secondlywe estimated the RUL of the bearings The full procedure isdetailed in the following

(A) HSMM Structure To determine an appropriate HSMMstructure for effectively modeling the considered data weconsidered several HSMM structures characterized by (i)the duration distribution family (being Gaussian Gamma orWeibull) (ii) an increasing number of states119873 from 2 to 6and (iii) an increasing number of Gaussian mixtures 119872 inthe observation density from 1 to 4 For eachmodel structureobtained by systematically considering all the combinationsof (i) to (iii) we run 120 parameter learnings correspondingto 120 random initializations 1205820 on the data sets (Bearing1 1

Mathematical Problems in Engineering 17

times106

0 1 2 3 4 5 6 7

50

40

30

20

10

0

minus10

minus20

minus30

minus40

minus50

Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]

(b)(a)

0 280300

7

Extracted feature RMS

Time (s)

0 280300

60

Time (s)

Extracted feature kurtosis

Featureextraction

Win

dow

1

Win

dow

2

Win

dow

3

Win

dow

Win

dow

r(t)

x1 x2 x3

middot middot middot

middot middot middot

nminus1

n

xnminus1 xn

Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1

Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density

(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration

Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast

119894were estimated for the 119894th testing bear-

ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878

4as the failure state The same proce-

dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-

ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as

18 Mathematical Problems in Engineering

2 3 4 5 60

005

01

015

02

025

03

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(a) AIC values for Condition 1

2 3 4 5 60

01

02

03

04

05

06

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(b) AIC values for Condition 2

Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density

0 22590

22590

45000

Time (s)

True RULUpper RUL

Average RULLower RUL

(a) RUL estimation for Bearing1 7

True RULUpper RUL

Average RULLower RUL

0 70100

7010

19000

Time (s)

(b) RUL estimation for Bearing2 6

Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings

well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time

Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors

of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2

6 Conclusion and Future Work

In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation

The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required

Mathematical Problems in Engineering 19

Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds

(a) Condition 1

Test Bearings APEavg APElow APEup

Bearing1 1 105716 127230 94146Bearing1 2 43312 38156 38213Bearing1 3 29970 97309 60912Bearing1 4 63363 28766 148719Bearing1 5 19689 74484 104115Bearing1 6 42530 98964 97937Bearing1 7 13880 74943 100881Average 45494 77122 92132

(b) Condition 2

Test Bearings APEavg APElow APEup

Bearing2 1 24759 50065 72875Bearing2 2 16473 44972 82886Bearing2 3 88771 95083 79621Bearing2 4 17698 42486 49825Bearing2 5 86631 104900 107300Bearing2 6 8771 35047 66870Bearing2 7 30125 38664 66519Average 39033 58745 75128

parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology

This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications

As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension

Appendix

In this appendix we give the derivation of the state durationvariable introduced in (20) as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(119883

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A1)

The random variable 119889119905(119894) has been defined in Section 21

as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889

119905(119894) is sampled from an

arbitrary distribution

119889119905(119894) sim 119891 (119889) (A2)

We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as

P (119889119905(119894) = 119889) = P (119904

119905minus119889minus1= 119878

119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905 120582)

(A3)

We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889

119905(119894) of 119889

119905(119894) defined as its

expected value (see Equation (15))

119889119905(119894) = E (119889

119905(119894) | 119904

119905= 119878

119894 x

1x2sdot sdot sdot x

119905) 1 le 119894 le 119873 (A4)

From the definition of expectation we have

119889119905(119894) =

119905

sum

119889=1

119889 sdot P (119889119905(119894) = 119889)

=

119905

sum

119889=1

119889 sdot P (119904119905minus119889minus1

= 119878119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

(A5)

For 119889119905+1

(119894) we have

119889119905+1

(119894) =

119905+1

sum

119889=1

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119886)

(A6)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

(A7)

By noticing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

= (P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1))

sdot (P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1))

minus1

(A8)

20 Mathematical Problems in Engineering

we can replace the probability of the second termof (A7)with

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119887)

(A9)

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

(A10)

In the last factor of (A10) we can omit the information aboutthe current state and observation by observing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

asymp P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894| 119904

119905= 119878

119894 x

1 x

119905)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119888)

(A11)

if the following independencies hold

119904119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

119883119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

(A12)

wherewithperpwe denote independency Equation (A12) holdsforHMMs (evenwithout conditioning on x

1 x

119905) but they

do not hold for HSMMs since the state duration (expressedby 119904

119905minus119889+1 119904

119905minus1) determines the system evolution On

the other hand state duration is partially known by theobservtions x

1 x

119905Thus the approximation is reasonable

as long as the uncertainty on the states is within limitsFrom (A6) (A9) and (A11) we obtain

119889119905+1

(119894) = (119886) +

119905+1

sum

119889=2

119889 sdot (119887) sdot (119888)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

P(119860119861|119862)=P(119860|119861119862)sdotP(119861|119862)

+

119905+1

sum

119889=2

119889 sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

it does not depend on 119889

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

+ P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [

[

P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894119904

119905+1= 119878

119894 x

1 x

119905x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

for the approximation of (A11)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

]

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) +

119905

sum

1198891015840=1

(1198891015840+ 1)

sdot P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

(A13)

Noticing that

119905

sum

1198891015840=1

P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

+ P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) = 1

(A14)

because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows

119889119905+1

(119894) = P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1) sdot (119889

119905(119894) + 1)

(A15)

The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step

In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889

119905+1(119894)

we can consider the following equality

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=P (119904

119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

P (119904119905+1

= 119878119894| x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

120574119905+1

(119894)

(A16)

If we consider the terms involved in the probability at thenumerator of the right-hand side of (A16) we have that

x1 x

119905⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119861

perp x119905+1⏟⏟⏟⏟⏟⏟⏟

119862

| 119904119905= 119878

119894 119904

119905+1= 119878

119894⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119860

(A17)

Mathematical Problems in Engineering 21

If 119861 perp 119862 | 119860 for the Bayes rule we have that

P (119860 | 119862 119861) =P (119862 | 119860119861) sdot P (119860 | 119861)

P (119862 | 119861) (A18)

Hence we can rewrite the numerator of the right-hand sideof (A16) as follows

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

= (P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905)

sdot P(x119905+1

|

x119905+1

perp119904119905|119904119905+1

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞119904119905= 119878

119894 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

= (P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

sdot

120574119905(119894)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (119904

119905= 119878

119894| x

1 x

119905) sdot

119887119894(x119905+1

)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (x

119905+1| 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

(A19)

The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as

P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

= sum

119889119905

119886119894119894(d

119905) sdot P (119889

119905| x

1 x

119905)

asymp 119886119894119894(d

119905)

(A20)

while the denominator of (A19) can be expressed as follows

P (x119905+1

| x1 x

119905) =

P (x1 x

119905 x

119905+1)

P (x1 x

119905)

=sum

119873

119894=1120572119905+1

(119894)

sum119873

119894=1120572119905(119894)

(A21)

By substituting (A20) and (A21) in (A19) we obtain

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

sum119873

119894=1120572119905+1

(119894)

(A22)

and then by combining (A22) and (A16) we obtain

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

120574119905+1

(119894) sum119873

119894=1120572119905+1

(119894)

(A23)

Finally by substituting (A23) in (A15) and considering that

120574119905(119894) =

120572119905(119894)

sum119873

119894=1120572119905(119894)

(A24)

we derive the induction formula for 119889119905+1

(119894) in terms of modelparameters as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(x

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A25)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989

[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004

[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996

[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002

[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005

[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008

[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309

[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en

[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003

[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002

[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven

22 Mathematical Problems in Engineering

methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012

[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008

[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989

[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031

[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010

[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010

[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011

[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012

[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012

[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980

[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk

[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007

[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005

[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003

[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997

[26] S-Z Yu ldquoHidden semi-Markov modelsrdquo Artificial Intelligencevol 174 no 2 pp 215ndash243 2010

[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003

[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006

[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014

[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003

[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008

[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005

[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999

[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007

[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011

[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001

[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002

[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007

[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997

[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008

[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005

[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997

Mathematical Problems in Engineering 23

[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002

[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986

[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993

[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006

[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973

[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977

[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012

[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985

[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015

[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013

[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013

[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013

[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013

[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012

[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012

[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model

and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013

[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 17: Hidden Semi-Markov Models for Predictive Maintenance

Mathematical Problems in Engineering 17

times106

0 1 2 3 4 5 6 7

50

40

30

20

10

0

minus10

minus20

minus30

minus40

minus50

Figure 7 A tested bearing before and after the experiment with its recorded vibration signal [49]

(b)(a)

0 280300

7

Extracted feature RMS

Time (s)

0 280300

60

Time (s)

Extracted feature kurtosis

Featureextraction

Win

dow

1

Win

dow

2

Win

dow

3

Win

dow

Win

dow

r(t)

x1 x2 x3

middot middot middot

middot middot middot

nminus1

n

xnminus1 xn

Figure 8 Raw vibration data (a) versus RMS and kurtosis features (b) for Bearing1 1

Bearing1 2 Bearing1 7 and Bearing2 1 Bearing2 2 Bearing2 7)

Similar to Section 512 at the end of each learningwe evaluated the AIC value (Equation (46)) as reported inFigures 9(a) and 9(b) for conditions 1 and 2 respectively Inboth cases the global minimumAIC value corresponds to anHSMM with 119873 = 4 states a Weibull duration model and a119872 = 1 Gaussians mixture for the observation density

(B) RUL Estimation Using the above obtained optimalHSMM structure we trained it via a leave-one-out crossvalidation scheme by using for condition 1 at each iteration

Bearing1 i 1 le 119894 le 7 as the testing bearing while theremaining six bearings were used for training Once thetrained parameters 120582lowast

119894were estimated for the 119894th testing bear-

ing we progressively collected the observations of the testedBearing1 i to calculate at each time 119905 the average lower andupper RUL as specified in (57) (58) and (59) respectivelyconsidering the state 119878

4as the failure state The same proce-

dure has been performed for the bearings in condition 2Examples of RUL estimation for Bearing1 7 and Bear-

ing2 6 are shown in Figures 10(a) and 10(b) respectivelywhere the black solid line represents the real RUL which goesto zero as the time goes on As it can be seen the average as

18 Mathematical Problems in Engineering

2 3 4 5 60

005

01

015

02

025

03

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(a) AIC values for Condition 1

2 3 4 5 60

01

02

03

04

05

06

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(b) AIC values for Condition 2

Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density

0 22590

22590

45000

Time (s)

True RULUpper RUL

Average RULLower RUL

(a) RUL estimation for Bearing1 7

True RULUpper RUL

Average RULLower RUL

0 70100

7010

19000

Time (s)

(b) RUL estimation for Bearing2 6

Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings

well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time

Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors

of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2

6 Conclusion and Future Work

In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation

The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required

Mathematical Problems in Engineering 19

Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds

(a) Condition 1

Test Bearings APEavg APElow APEup

Bearing1 1 105716 127230 94146Bearing1 2 43312 38156 38213Bearing1 3 29970 97309 60912Bearing1 4 63363 28766 148719Bearing1 5 19689 74484 104115Bearing1 6 42530 98964 97937Bearing1 7 13880 74943 100881Average 45494 77122 92132

(b) Condition 2

Test Bearings APEavg APElow APEup

Bearing2 1 24759 50065 72875Bearing2 2 16473 44972 82886Bearing2 3 88771 95083 79621Bearing2 4 17698 42486 49825Bearing2 5 86631 104900 107300Bearing2 6 8771 35047 66870Bearing2 7 30125 38664 66519Average 39033 58745 75128

parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology

This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications

As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension

Appendix

In this appendix we give the derivation of the state durationvariable introduced in (20) as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(119883

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A1)

The random variable 119889119905(119894) has been defined in Section 21

as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889

119905(119894) is sampled from an

arbitrary distribution

119889119905(119894) sim 119891 (119889) (A2)

We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as

P (119889119905(119894) = 119889) = P (119904

119905minus119889minus1= 119878

119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905 120582)

(A3)

We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889

119905(119894) of 119889

119905(119894) defined as its

expected value (see Equation (15))

119889119905(119894) = E (119889

119905(119894) | 119904

119905= 119878

119894 x

1x2sdot sdot sdot x

119905) 1 le 119894 le 119873 (A4)

From the definition of expectation we have

119889119905(119894) =

119905

sum

119889=1

119889 sdot P (119889119905(119894) = 119889)

=

119905

sum

119889=1

119889 sdot P (119904119905minus119889minus1

= 119878119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

(A5)

For 119889119905+1

(119894) we have

119889119905+1

(119894) =

119905+1

sum

119889=1

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119886)

(A6)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

(A7)

By noticing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

= (P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1))

sdot (P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1))

minus1

(A8)

20 Mathematical Problems in Engineering

we can replace the probability of the second termof (A7)with

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119887)

(A9)

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

(A10)

In the last factor of (A10) we can omit the information aboutthe current state and observation by observing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

asymp P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894| 119904

119905= 119878

119894 x

1 x

119905)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119888)

(A11)

if the following independencies hold

119904119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

119883119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

(A12)

wherewithperpwe denote independency Equation (A12) holdsforHMMs (evenwithout conditioning on x

1 x

119905) but they

do not hold for HSMMs since the state duration (expressedby 119904

119905minus119889+1 119904

119905minus1) determines the system evolution On

the other hand state duration is partially known by theobservtions x

1 x

119905Thus the approximation is reasonable

as long as the uncertainty on the states is within limitsFrom (A6) (A9) and (A11) we obtain

119889119905+1

(119894) = (119886) +

119905+1

sum

119889=2

119889 sdot (119887) sdot (119888)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

P(119860119861|119862)=P(119860|119861119862)sdotP(119861|119862)

+

119905+1

sum

119889=2

119889 sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

it does not depend on 119889

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

+ P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [

[

P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894119904

119905+1= 119878

119894 x

1 x

119905x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

for the approximation of (A11)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

]

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) +

119905

sum

1198891015840=1

(1198891015840+ 1)

sdot P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

(A13)

Noticing that

119905

sum

1198891015840=1

P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

+ P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) = 1

(A14)

because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows

119889119905+1

(119894) = P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1) sdot (119889

119905(119894) + 1)

(A15)

The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step

In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889

119905+1(119894)

we can consider the following equality

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=P (119904

119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

P (119904119905+1

= 119878119894| x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

120574119905+1

(119894)

(A16)

If we consider the terms involved in the probability at thenumerator of the right-hand side of (A16) we have that

x1 x

119905⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119861

perp x119905+1⏟⏟⏟⏟⏟⏟⏟

119862

| 119904119905= 119878

119894 119904

119905+1= 119878

119894⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119860

(A17)

Mathematical Problems in Engineering 21

If 119861 perp 119862 | 119860 for the Bayes rule we have that

P (119860 | 119862 119861) =P (119862 | 119860119861) sdot P (119860 | 119861)

P (119862 | 119861) (A18)

Hence we can rewrite the numerator of the right-hand sideof (A16) as follows

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

= (P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905)

sdot P(x119905+1

|

x119905+1

perp119904119905|119904119905+1

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞119904119905= 119878

119894 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

= (P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

sdot

120574119905(119894)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (119904

119905= 119878

119894| x

1 x

119905) sdot

119887119894(x119905+1

)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (x

119905+1| 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

(A19)

The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as

P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

= sum

119889119905

119886119894119894(d

119905) sdot P (119889

119905| x

1 x

119905)

asymp 119886119894119894(d

119905)

(A20)

while the denominator of (A19) can be expressed as follows

P (x119905+1

| x1 x

119905) =

P (x1 x

119905 x

119905+1)

P (x1 x

119905)

=sum

119873

119894=1120572119905+1

(119894)

sum119873

119894=1120572119905(119894)

(A21)

By substituting (A20) and (A21) in (A19) we obtain

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

sum119873

119894=1120572119905+1

(119894)

(A22)

and then by combining (A22) and (A16) we obtain

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

120574119905+1

(119894) sum119873

119894=1120572119905+1

(119894)

(A23)

Finally by substituting (A23) in (A15) and considering that

120574119905(119894) =

120572119905(119894)

sum119873

119894=1120572119905(119894)

(A24)

we derive the induction formula for 119889119905+1

(119894) in terms of modelparameters as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(x

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A25)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989

[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004

[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996

[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002

[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005

[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008

[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309

[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en

[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003

[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002

[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven

22 Mathematical Problems in Engineering

methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012

[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008

[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989

[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031

[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010

[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010

[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011

[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012

[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012

[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980

[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk

[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007

[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005

[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003

[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997

[26] S-Z Yu ldquoHidden semi-Markov modelsrdquo Artificial Intelligencevol 174 no 2 pp 215ndash243 2010

[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003

[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006

[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014

[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003

[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008

[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005

[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999

[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007

[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011

[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001

[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002

[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007

[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997

[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008

[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005

[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997

Mathematical Problems in Engineering 23

[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002

[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986

[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993

[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006

[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973

[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977

[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012

[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985

[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015

[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013

[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013

[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013

[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013

[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012

[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012

[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model

and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013

[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 18: Hidden Semi-Markov Models for Predictive Maintenance

18 Mathematical Problems in Engineering

2 3 4 5 60

005

01

015

02

025

03

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(a) AIC values for Condition 1

2 3 4 5 60

01

02

03

04

05

06

Number of states

Gaussian duration modelGamma duration modelWeibull duration model

(b) AIC values for Condition 2

Figure 9 In both cases the minimum AIC value corresponds to an HSMMwith119873 = 4 states a Weibull duration model and119872 = 1mixturein the observation density

0 22590

22590

45000

Time (s)

True RULUpper RUL

Average RULLower RUL

(a) RUL estimation for Bearing1 7

True RULUpper RUL

Average RULLower RUL

0 70100

7010

19000

Time (s)

(b) RUL estimation for Bearing2 6

Figure 10 By obtaining a low average absolute prediction error the proposed parametric HSMM is effective for estimating the RemainingUseful Lifetime of bearings

well as the lower and the upper bound estimations convergeto the real RUL as the real failure time approaches and theuncertainty about the estimation decreases with time

Concerning the quantitative estimation of the predictiveperformances we report in Table 5 the average absoluteprediction error of the RUL estimation (see Equation (64))expressed in seconds As it can be noticed the averageabsolute prediction error of the average RUL is respectively 1hour and 15minutes for condition 1 and 1 hour and 5minutesfor condition 2 which are good values considering the highvariability in the training set durations and the fact that theperformance metric takes into account also the less accuratepredictions performed at an early stage of the bearings lifeMoreover for both conditions the average prediction errors

of 5 tests out of 7 are below the average while the best averageerror of themeanRUL is only 23minutes for condition 1whileit further decreases to 14 minutes for condition 2

6 Conclusion and Future Work

In this paper we introduced an approach based on HiddenSemi-Markov Models (HSMM) and Akaike InformationCriteria (AIC) to perform (i) automatic model selection (ii)online condition monitoring and (iii) online time to eventestimation

The proposed HSMM models the state duration distri-bution with a parametric density allowing a less computa-tionally expensive learning procedure due to the few required

Mathematical Problems in Engineering 19

Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds

(a) Condition 1

Test Bearings APEavg APElow APEup

Bearing1 1 105716 127230 94146Bearing1 2 43312 38156 38213Bearing1 3 29970 97309 60912Bearing1 4 63363 28766 148719Bearing1 5 19689 74484 104115Bearing1 6 42530 98964 97937Bearing1 7 13880 74943 100881Average 45494 77122 92132

(b) Condition 2

Test Bearings APEavg APElow APEup

Bearing2 1 24759 50065 72875Bearing2 2 16473 44972 82886Bearing2 3 88771 95083 79621Bearing2 4 17698 42486 49825Bearing2 5 86631 104900 107300Bearing2 6 8771 35047 66870Bearing2 7 30125 38664 66519Average 39033 58745 75128

parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology

This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications

As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension

Appendix

In this appendix we give the derivation of the state durationvariable introduced in (20) as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(119883

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A1)

The random variable 119889119905(119894) has been defined in Section 21

as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889

119905(119894) is sampled from an

arbitrary distribution

119889119905(119894) sim 119891 (119889) (A2)

We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as

P (119889119905(119894) = 119889) = P (119904

119905minus119889minus1= 119878

119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905 120582)

(A3)

We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889

119905(119894) of 119889

119905(119894) defined as its

expected value (see Equation (15))

119889119905(119894) = E (119889

119905(119894) | 119904

119905= 119878

119894 x

1x2sdot sdot sdot x

119905) 1 le 119894 le 119873 (A4)

From the definition of expectation we have

119889119905(119894) =

119905

sum

119889=1

119889 sdot P (119889119905(119894) = 119889)

=

119905

sum

119889=1

119889 sdot P (119904119905minus119889minus1

= 119878119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

(A5)

For 119889119905+1

(119894) we have

119889119905+1

(119894) =

119905+1

sum

119889=1

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119886)

(A6)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

(A7)

By noticing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

= (P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1))

sdot (P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1))

minus1

(A8)

20 Mathematical Problems in Engineering

we can replace the probability of the second termof (A7)with

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119887)

(A9)

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

(A10)

In the last factor of (A10) we can omit the information aboutthe current state and observation by observing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

asymp P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894| 119904

119905= 119878

119894 x

1 x

119905)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119888)

(A11)

if the following independencies hold

119904119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

119883119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

(A12)

wherewithperpwe denote independency Equation (A12) holdsforHMMs (evenwithout conditioning on x

1 x

119905) but they

do not hold for HSMMs since the state duration (expressedby 119904

119905minus119889+1 119904

119905minus1) determines the system evolution On

the other hand state duration is partially known by theobservtions x

1 x

119905Thus the approximation is reasonable

as long as the uncertainty on the states is within limitsFrom (A6) (A9) and (A11) we obtain

119889119905+1

(119894) = (119886) +

119905+1

sum

119889=2

119889 sdot (119887) sdot (119888)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

P(119860119861|119862)=P(119860|119861119862)sdotP(119861|119862)

+

119905+1

sum

119889=2

119889 sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

it does not depend on 119889

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

+ P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [

[

P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894119904

119905+1= 119878

119894 x

1 x

119905x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

for the approximation of (A11)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

]

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) +

119905

sum

1198891015840=1

(1198891015840+ 1)

sdot P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

(A13)

Noticing that

119905

sum

1198891015840=1

P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

+ P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) = 1

(A14)

because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows

119889119905+1

(119894) = P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1) sdot (119889

119905(119894) + 1)

(A15)

The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step

In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889

119905+1(119894)

we can consider the following equality

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=P (119904

119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

P (119904119905+1

= 119878119894| x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

120574119905+1

(119894)

(A16)

If we consider the terms involved in the probability at thenumerator of the right-hand side of (A16) we have that

x1 x

119905⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119861

perp x119905+1⏟⏟⏟⏟⏟⏟⏟

119862

| 119904119905= 119878

119894 119904

119905+1= 119878

119894⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119860

(A17)

Mathematical Problems in Engineering 21

If 119861 perp 119862 | 119860 for the Bayes rule we have that

P (119860 | 119862 119861) =P (119862 | 119860119861) sdot P (119860 | 119861)

P (119862 | 119861) (A18)

Hence we can rewrite the numerator of the right-hand sideof (A16) as follows

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

= (P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905)

sdot P(x119905+1

|

x119905+1

perp119904119905|119904119905+1

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞119904119905= 119878

119894 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

= (P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

sdot

120574119905(119894)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (119904

119905= 119878

119894| x

1 x

119905) sdot

119887119894(x119905+1

)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (x

119905+1| 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

(A19)

The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as

P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

= sum

119889119905

119886119894119894(d

119905) sdot P (119889

119905| x

1 x

119905)

asymp 119886119894119894(d

119905)

(A20)

while the denominator of (A19) can be expressed as follows

P (x119905+1

| x1 x

119905) =

P (x1 x

119905 x

119905+1)

P (x1 x

119905)

=sum

119873

119894=1120572119905+1

(119894)

sum119873

119894=1120572119905(119894)

(A21)

By substituting (A20) and (A21) in (A19) we obtain

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

sum119873

119894=1120572119905+1

(119894)

(A22)

and then by combining (A22) and (A16) we obtain

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

120574119905+1

(119894) sum119873

119894=1120572119905+1

(119894)

(A23)

Finally by substituting (A23) in (A15) and considering that

120574119905(119894) =

120572119905(119894)

sum119873

119894=1120572119905(119894)

(A24)

we derive the induction formula for 119889119905+1

(119894) in terms of modelparameters as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(x

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A25)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989

[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004

[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996

[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002

[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005

[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008

[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309

[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en

[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003

[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002

[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven

22 Mathematical Problems in Engineering

methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012

[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008

[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989

[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031

[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010

[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010

[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011

[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012

[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012

[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980

[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk

[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007

[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005

[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003

[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997

[26] S-Z Yu ldquoHidden semi-Markov modelsrdquo Artificial Intelligencevol 174 no 2 pp 215ndash243 2010

[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003

[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006

[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014

[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003

[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008

[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005

[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999

[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007

[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011

[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001

[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002

[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007

[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997

[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008

[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005

[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997

Mathematical Problems in Engineering 23

[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002

[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986

[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993

[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006

[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973

[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977

[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012

[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985

[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015

[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013

[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013

[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013

[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013

[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012

[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012

[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model

and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013

[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 19: Hidden Semi-Markov Models for Predictive Maintenance

Mathematical Problems in Engineering 19

Table 5 Average absolute prediction error (APE) of the RUL esti-mation expressed in seconds

(a) Condition 1

Test Bearings APEavg APElow APEup

Bearing1 1 105716 127230 94146Bearing1 2 43312 38156 38213Bearing1 3 29970 97309 60912Bearing1 4 63363 28766 148719Bearing1 5 19689 74484 104115Bearing1 6 42530 98964 97937Bearing1 7 13880 74943 100881Average 45494 77122 92132

(b) Condition 2

Test Bearings APEavg APElow APEup

Bearing2 1 24759 50065 72875Bearing2 2 16473 44972 82886Bearing2 3 88771 95083 79621Bearing2 4 17698 42486 49825Bearing2 5 86631 104900 107300Bearing2 6 8771 35047 66870Bearing2 7 30125 38664 66519Average 39033 58745 75128

parameters to estimate Together with the provided generalmodel specification the modified learning inference andprediction algorithms allow the usage of any parametricdistribution tomodel the state duration as well as continuousor discrete observations As a consequence a wide class ofdifferent applications can be modeled with the proposedmethodology

This paper highlights through experiments performedon simulated data that the proposed approach is effectivein (i) automatically selecting the correct configuration of theHSMM in terms of number of states and correct durationdistribution family (ii) performing online state estimationand (iii) correctly predict the time to a determinate eventidentified as the entrance of the model in a target state Asa consequence the proposed parametric HSMM combinedwith AIC can be used in practice for condition monitoringand Remaining Useful Lifetime applications

As the targeted application of the proposed methodologyis failure prognosis in industrial machines combining theproposed HSMM model with online learning procedurecapable of adapting the model parameter to new conditionswould be considered in a future extension

Appendix

In this appendix we give the derivation of the state durationvariable introduced in (20) as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(119883

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A1)

The random variable 119889119905(119894) has been defined in Section 21

as the duration spent in state 119894 prior to current time 119905 assumingthat the state at current time 119905 be 119894 119889

119905(119894) is sampled from an

arbitrary distribution

119889119905(119894) sim 119891 (119889) (A2)

We can specify the probability that the system has been instate 119894 for 119889 time units prior to current time 119905 giving theobservations and the model parameters 120582 and knowing thatthe current state is 119894 as

P (119889119905(119894) = 119889) = P (119904

119905minus119889minus1= 119878

119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905 120582)

(A3)

We omit the conditioning to the model parameters 120582 inthe following equations being inherently implied We areinterested to derive the estimator 119889

119905(119894) of 119889

119905(119894) defined as its

expected value (see Equation (15))

119889119905(119894) = E (119889

119905(119894) | 119904

119905= 119878

119894 x

1x2sdot sdot sdot x

119905) 1 le 119894 le 119873 (A4)

From the definition of expectation we have

119889119905(119894) =

119905

sum

119889=1

119889 sdot P (119889119905(119894) = 119889)

=

119905

sum

119889=1

119889 sdot P (119904119905minus119889minus1

= 119878119894 119904

119905minus119889= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

(A5)

For 119889119905+1

(119894) we have

119889119905+1

(119894) =

119905+1

sum

119889=1

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119886)

(A6)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

(A7)

By noticing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

= (P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1))

sdot (P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1))

minus1

(A8)

20 Mathematical Problems in Engineering

we can replace the probability of the second termof (A7)with

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119887)

(A9)

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

(A10)

In the last factor of (A10) we can omit the information aboutthe current state and observation by observing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

asymp P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894| 119904

119905= 119878

119894 x

1 x

119905)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119888)

(A11)

if the following independencies hold

119904119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

119883119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

(A12)

wherewithperpwe denote independency Equation (A12) holdsforHMMs (evenwithout conditioning on x

1 x

119905) but they

do not hold for HSMMs since the state duration (expressedby 119904

119905minus119889+1 119904

119905minus1) determines the system evolution On

the other hand state duration is partially known by theobservtions x

1 x

119905Thus the approximation is reasonable

as long as the uncertainty on the states is within limitsFrom (A6) (A9) and (A11) we obtain

119889119905+1

(119894) = (119886) +

119905+1

sum

119889=2

119889 sdot (119887) sdot (119888)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

P(119860119861|119862)=P(119860|119861119862)sdotP(119861|119862)

+

119905+1

sum

119889=2

119889 sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

it does not depend on 119889

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

+ P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [

[

P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894119904

119905+1= 119878

119894 x

1 x

119905x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

for the approximation of (A11)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

]

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) +

119905

sum

1198891015840=1

(1198891015840+ 1)

sdot P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

(A13)

Noticing that

119905

sum

1198891015840=1

P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

+ P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) = 1

(A14)

because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows

119889119905+1

(119894) = P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1) sdot (119889

119905(119894) + 1)

(A15)

The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step

In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889

119905+1(119894)

we can consider the following equality

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=P (119904

119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

P (119904119905+1

= 119878119894| x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

120574119905+1

(119894)

(A16)

If we consider the terms involved in the probability at thenumerator of the right-hand side of (A16) we have that

x1 x

119905⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119861

perp x119905+1⏟⏟⏟⏟⏟⏟⏟

119862

| 119904119905= 119878

119894 119904

119905+1= 119878

119894⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119860

(A17)

Mathematical Problems in Engineering 21

If 119861 perp 119862 | 119860 for the Bayes rule we have that

P (119860 | 119862 119861) =P (119862 | 119860119861) sdot P (119860 | 119861)

P (119862 | 119861) (A18)

Hence we can rewrite the numerator of the right-hand sideof (A16) as follows

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

= (P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905)

sdot P(x119905+1

|

x119905+1

perp119904119905|119904119905+1

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞119904119905= 119878

119894 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

= (P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

sdot

120574119905(119894)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (119904

119905= 119878

119894| x

1 x

119905) sdot

119887119894(x119905+1

)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (x

119905+1| 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

(A19)

The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as

P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

= sum

119889119905

119886119894119894(d

119905) sdot P (119889

119905| x

1 x

119905)

asymp 119886119894119894(d

119905)

(A20)

while the denominator of (A19) can be expressed as follows

P (x119905+1

| x1 x

119905) =

P (x1 x

119905 x

119905+1)

P (x1 x

119905)

=sum

119873

119894=1120572119905+1

(119894)

sum119873

119894=1120572119905(119894)

(A21)

By substituting (A20) and (A21) in (A19) we obtain

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

sum119873

119894=1120572119905+1

(119894)

(A22)

and then by combining (A22) and (A16) we obtain

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

120574119905+1

(119894) sum119873

119894=1120572119905+1

(119894)

(A23)

Finally by substituting (A23) in (A15) and considering that

120574119905(119894) =

120572119905(119894)

sum119873

119894=1120572119905(119894)

(A24)

we derive the induction formula for 119889119905+1

(119894) in terms of modelparameters as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(x

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A25)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989

[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004

[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996

[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002

[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005

[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008

[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309

[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en

[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003

[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002

[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven

22 Mathematical Problems in Engineering

methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012

[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008

[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989

[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031

[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010

[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010

[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011

[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012

[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012

[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980

[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk

[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007

[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005

[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003

[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997

[26] S-Z Yu ldquoHidden semi-Markov modelsrdquo Artificial Intelligencevol 174 no 2 pp 215ndash243 2010

[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003

[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006

[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014

[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003

[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008

[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005

[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999

[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007

[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011

[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001

[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002

[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007

[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997

[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008

[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005

[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997

Mathematical Problems in Engineering 23

[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002

[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986

[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993

[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006

[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973

[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977

[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012

[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985

[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015

[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013

[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013

[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013

[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013

[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012

[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012

[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model

and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013

[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 20: Hidden Semi-Markov Models for Predictive Maintenance

20 Mathematical Problems in Engineering

we can replace the probability of the second termof (A7)with

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894 119904

119905= 119878

119894|

119904119905+1

= 119878119894 x

1 x

119905+1)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119887)

(A9)

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

(A10)

In the last factor of (A10) we can omit the information aboutthe current state and observation by observing that

P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

asymp P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894| 119904

119905= 119878

119894 x

1 x

119905)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

(119888)

(A11)

if the following independencies hold

119904119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

119883119905+1

perp 119904119905minus119889+1

119904119905minus1

| 119904119905 x

1 x

119905

(A12)

wherewithperpwe denote independency Equation (A12) holdsforHMMs (evenwithout conditioning on x

1 x

119905) but they

do not hold for HSMMs since the state duration (expressedby 119904

119905minus119889+1 119904

119905minus1) determines the system evolution On

the other hand state duration is partially known by theobservtions x

1 x

119905Thus the approximation is reasonable

as long as the uncertainty on the states is within limitsFrom (A6) (A9) and (A11) we obtain

119889119905+1

(119894) = (119886) +

119905+1

sum

119889=2

119889 sdot (119887) sdot (119888)

= P (119904119905minus1

= 119878119894 119904

119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

P(119860119861|119862)=P(119860|119861119862)sdotP(119861|119862)

+

119905+1

sum

119889=2

119889 sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

it does not depend on 119889

sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

+ P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [

[

P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894119904

119905+1= 119878

119894 x

1 x

119905x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

for the approximation of (A11)

+

119905+1

sum

119889=2

119889 sdot P (119904119905minus119889

= 119878119894 119904

119905minus119889+1= 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

]

= P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

sdot [P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) +

119905

sum

1198891015840=1

(1198891015840+ 1)

sdot P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)]

(A13)

Noticing that

119905

sum

1198891015840=1

P (119904119905minus1198891015840minus1

= 119878119894 119904

119905minus1198891015840 = 119878

119894 119904

119905minus1= 119878

119894|

119904119905= 119878

119894 x

1 x

119905)

+ P (119904119905minus1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905) = 1

(A14)

because it represents the sum of the probabilities of all thepossible combinations of state sequences up to the currenttime 119905 we can rewrite (A13) as follows

119889119905+1

(119894) = P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1) sdot (119889

119905(119894) + 1)

(A15)

The intuition behind the latter induction formula is thatthe current average duration is the previous average durationplus 1 weighted with the ldquoamountrdquo of the current state thatwas already in state 119894 in the previous step

In order to transform (A15) in terms ofmodel parametersfor an easy numerical calculation of the induction for 119889

119905+1(119894)

we can consider the following equality

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=P (119904

119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

P (119904119905+1

= 119878119894| x

1 x

119905+1)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

120574119905+1

(119894)

(A16)

If we consider the terms involved in the probability at thenumerator of the right-hand side of (A16) we have that

x1 x

119905⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119861

perp x119905+1⏟⏟⏟⏟⏟⏟⏟

119862

| 119904119905= 119878

119894 119904

119905+1= 119878

119894⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

119860

(A17)

Mathematical Problems in Engineering 21

If 119861 perp 119862 | 119860 for the Bayes rule we have that

P (119860 | 119862 119861) =P (119862 | 119860119861) sdot P (119860 | 119861)

P (119862 | 119861) (A18)

Hence we can rewrite the numerator of the right-hand sideof (A16) as follows

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

= (P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905)

sdot P(x119905+1

|

x119905+1

perp119904119905|119904119905+1

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞119904119905= 119878

119894 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

= (P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

sdot

120574119905(119894)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (119904

119905= 119878

119894| x

1 x

119905) sdot

119887119894(x119905+1

)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (x

119905+1| 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

(A19)

The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as

P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

= sum

119889119905

119886119894119894(d

119905) sdot P (119889

119905| x

1 x

119905)

asymp 119886119894119894(d

119905)

(A20)

while the denominator of (A19) can be expressed as follows

P (x119905+1

| x1 x

119905) =

P (x1 x

119905 x

119905+1)

P (x1 x

119905)

=sum

119873

119894=1120572119905+1

(119894)

sum119873

119894=1120572119905(119894)

(A21)

By substituting (A20) and (A21) in (A19) we obtain

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

sum119873

119894=1120572119905+1

(119894)

(A22)

and then by combining (A22) and (A16) we obtain

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

120574119905+1

(119894) sum119873

119894=1120572119905+1

(119894)

(A23)

Finally by substituting (A23) in (A15) and considering that

120574119905(119894) =

120572119905(119894)

sum119873

119894=1120572119905(119894)

(A24)

we derive the induction formula for 119889119905+1

(119894) in terms of modelparameters as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(x

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A25)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989

[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004

[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996

[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002

[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005

[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008

[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309

[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en

[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003

[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002

[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven

22 Mathematical Problems in Engineering

methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012

[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008

[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989

[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031

[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010

[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010

[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011

[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012

[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012

[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980

[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk

[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007

[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005

[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003

[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997

[26] S-Z Yu ldquoHidden semi-Markov modelsrdquo Artificial Intelligencevol 174 no 2 pp 215ndash243 2010

[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003

[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006

[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014

[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003

[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008

[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005

[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999

[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007

[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011

[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001

[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002

[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007

[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997

[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008

[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005

[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997

Mathematical Problems in Engineering 23

[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002

[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986

[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993

[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006

[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973

[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977

[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012

[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985

[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015

[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013

[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013

[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013

[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013

[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012

[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012

[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model

and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013

[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 21: Hidden Semi-Markov Models for Predictive Maintenance

Mathematical Problems in Engineering 21

If 119861 perp 119862 | 119860 for the Bayes rule we have that

P (119860 | 119862 119861) =P (119862 | 119860119861) sdot P (119860 | 119861)

P (119862 | 119861) (A18)

Hence we can rewrite the numerator of the right-hand sideof (A16) as follows

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

= (P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905)

sdot P(x119905+1

|

x119905+1

perp119904119905|119904119905+1

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞119904119905= 119878

119894 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

= (P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

sdot

120574119905(119894)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (119904

119905= 119878

119894| x

1 x

119905) sdot

119887119894(x119905+1

)

⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞P (x

119905+1| 119904

119905+1= 119878

119894))

sdot (P (x119905+1

| x1 x

119905))

minus1

(A19)

The first probability in the numerator of (A19) is the statetransition which can be approximated by considering theaverage duration as

P (119904119905+1

= 119878119894| 119904

119905= 119878

119894 x

1 x

119905)

= sum

119889119905

119886119894119894(d

119905) sdot P (119889

119905| x

1 x

119905)

asymp 119886119894119894(d

119905)

(A20)

while the denominator of (A19) can be expressed as follows

P (x119905+1

| x1 x

119905) =

P (x1 x

119905 x

119905+1)

P (x1 x

119905)

=sum

119873

119894=1120572119905+1

(119894)

sum119873

119894=1120572119905(119894)

(A21)

By substituting (A20) and (A21) in (A19) we obtain

P (119904119905= 119878

119894 119904

119905+1= 119878

119894| x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

sum119873

119894=1120572119905+1

(119894)

(A22)

and then by combining (A22) and (A16) we obtain

P (119904119905= 119878

119894| 119904

119905+1= 119878

119894 x

1 x

119905+1)

=119886119894119894(d

119905) sdot 120574

119905(119894) sdot sum

119873

119894=1120572119905(119894) sdot 119887

119894(x

119905+1)

120574119905+1

(119894) sum119873

119894=1120572119905+1

(119894)

(A23)

Finally by substituting (A23) in (A15) and considering that

120574119905(119894) =

120572119905(119894)

sum119873

119894=1120572119905(119894)

(A24)

we derive the induction formula for 119889119905+1

(119894) in terms of modelparameters as

119889119905+1

(119894) =119886119894119894(d

119905) sdot 120572

119905(119894) sdot 119887

119894(x

119905+1)

120572119905+1

(119894)sdot (119889

119905(119894) + 1) (A25)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] L Solomon ldquoEssential elements of maintenance improvementprogramsrdquo in Proceedings of the IFAC Workshopon ProductionControl in the Process Industry Osaka Japan and Kariya JapanOctober-November 1989 E Oshima and C van Rijn Eds pp195ndash198 Pergamon Press Oxford UK 1989

[2] T HonkanenModelling industrial maintenance systems and theeffects of automatic condition monitoring [PhD dissertation]Helsinki University of Technology Information and ComputerSystems in Automation 2004

[3] R Dekker ldquoApplications of maintenance optimization modelsa review and analysisrdquo Reliability Engineering amp System Safetyvol 51 no 3 pp 229ndash240 1996

[4] H Wang ldquoA survey of maintenance policies of deterioratingsystemsrdquo European Journal of Operational Research vol 139 no3 pp 469ndash489 2002

[5] AFNOR ldquoCondition monitoring and diagnostics of ma-chinesmdashprognosticsmdashpart 1 generalguidelinesrdquo Tech Rep NFISO 13381-1 2005

[6] F Salfner Event-based failure prediction an extended hiddenmarkov model approach [PhD thesis] Humboldt-Universitatzu Berlin Germany 2008

[7] C Domeniconi C-S Perng R Vilalta and S Ma ldquoA clas-sification approachfor prediction of target events in temporalsequencesrdquo in Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery (PKDDrsquo02) pp 125ndash137 Springer LondonUK 2002 httpdlacmorgcitationcfmid=645806670309

[8] K Medjaher J-Y Moya and N Zerhouni ldquoFailure prognosticby using dynamic Bayesian networksrdquo in Dependable Controlof Discrete Systems 2nd IFACWorkshop on Dependable Controlof Discrete Systems (DCDS rsquo09) June 2009 Bari Italy MP Fanti and M Dotoli Eds vol 1 pp 291ndash296 Interna-tional Federation of Accountants New York NY USA 2009httphalarchives-ouvertesfrhal-00402938en

[9] A Sfetsos ldquoShort-term load forecasting with a hybrid cluster-ing algorithmrdquo IEE Proceedings Generation Transmission andDistribution vol 150 no 3 pp 257ndash262 2003

[10] R Vilalta and S Ma ldquoPredicting rare events in temporaldomainsrdquo in Proceedings of the 2nd IEEE International Confer-ence on Data Mining (ICDM rsquo02) pp 474ndash481 December 2002

[11] E Sutrisno H Oh A S S Vasan and M Pecht ldquoEstimationof remaining useful life of ball bearings using data driven

22 Mathematical Problems in Engineering

methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012

[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008

[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989

[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031

[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010

[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010

[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011

[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012

[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012

[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980

[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk

[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007

[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005

[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003

[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997

[26] S-Z Yu ldquoHidden semi-Markov modelsrdquo Artificial Intelligencevol 174 no 2 pp 215ndash243 2010

[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003

[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006

[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014

[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003

[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008

[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005

[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999

[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007

[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011

[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001

[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002

[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007

[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997

[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008

[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005

[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997

Mathematical Problems in Engineering 23

[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002

[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986

[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993

[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006

[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973

[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977

[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012

[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985

[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015

[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013

[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013

[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013

[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013

[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012

[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012

[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model

and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013

[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 22: Hidden Semi-Markov Models for Predictive Maintenance

22 Mathematical Problems in Engineering

methodologiesrdquo in Proceedings of the IEEE Conference onPrognostics and HealthManagement (PHM rsquo12) pp 1ndash7 DenverColo USA June 2012

[12] K Goebel B Saha and A Saxena ldquoA comparison of threedata-driven techniques for prognosticsrdquo in Proceedings of the62nd Meeting of the Society for Machinery Failure PreventionTechnology April 2008

[13] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989

[14] F Cartella T Liu S Meganck J Lemeire and H Sahli ldquoOnlineadaptive learning of left-right continuous HMM for bearingscondition assessmentrdquo Journal of Physics Conference Seriesvol 364 Article ID 012031 2012 httpiopscienceioporg1742-65963641012031

[15] S Lee L Li and J Ni ldquoOnline degradation assessment andadaptive fault detection usingmodified hidden markov modelrdquoJournal of Manufacturing Science and Engineering vol 132 no2 Article ID 021010 11 pages 2010

[16] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA mixture of gaussians hiddenmarkov model for failure diag-nostic and prognosticrdquo in Proceedings of the IEEE InternationalConference on Automation Science and Engineering (CASE rsquo10)pp 338ndash343 Toronto Canada August 2010

[17] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoEstimation of the remaining useful life by using waveletpacket decomposition and HMMsrdquo in Proceedings of the IEEEAerospace Conference (AERO rsquo11) pp 1ndash10 IEEE ComputerSociety AIAA Big Sky Mont USA March 2011

[18] D A Tobon-Mejia K Medjaher N Zerhouni and G TripotldquoA data-driven failure prognostics method based on mixtureof gaussians hidden markov modelsrdquo IEEE Transactions onReliability vol 61 no 2 pp 491ndash503 2012

[19] K Medjaher D A Tobon-Mejia and N Zerhouni ldquoRemaininguseful life estimation of critical components with applicationto bearingsrdquo IEEE Transactions on Reliability vol 61 no 2 pp292ndash302 2012

[20] J Ferguson ldquoVariable duration models for speechrdquo in Proceed-ings of the Symposium on the Application of Hidden MarkovModels to Text and Speech pp 143ndash179 October 1980

[21] K P Murphy ldquoHidden semi-Markov models (hsmms)rdquo TechRep University of British Columbia 2002 httpwwwcsubccasimmurphyk

[22] A Kundu T Hines J Phillips B D Huyck and L C VanGuilder ldquoArabic handwriting recognition using variable dura-tion HMMrdquo in Proceedings of the 9th International Conferenceon Document Analysis and Recognition (ICDAR rsquo07) vol 2 pp644ndash648 IEEE Washington DC USA September 2007

[23] M T Johnson ldquoCapacity and complexity of HMM durationmodeling techniquesrdquo IEEE Signal Processing Letters vol 12 no5 pp 407ndash410 2005

[24] J-T Chien and C-H Huang ldquoBayesian learning of speechduration modelsrdquo IEEE Transactions on Speech and AudioProcessing vol 11 no 6 pp 558ndash567 2003

[25] K Laurila ldquoNoise robust speech recognition with state durationconstraintsrdquo in Proceedings of the IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo97) vol 2pp 871ndash874 IEEE Computer Society Munich Germany April1997

[26] S-Z Yu ldquoHidden semi-Markov modelsrdquo Artificial Intelligencevol 174 no 2 pp 215ndash243 2010

[27] S-Z Yu and H Kobayashi ldquoAn efficient forward-backwardalgorithm for an explicit-duration hiddenMarkovmodelrdquo IEEESignal Processing Letters vol 10 no 1 pp 11ndash14 2003

[28] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006

[29] N Wang S-D Sun Z-Q Cai S Zhang and C SayginldquoA hidden semi-markov model with duration-dependent statetransition probabilities for prognosticsrdquoMathematical Problemsin Engineering vol 2014 Article ID 632702 10 pages 2014

[30] M Azimi P Nasiopoulos and R K Ward ldquoOnline identifica-tion of hidden semi-Markov modelsrdquo in Proceedings of the 3rdInternational Symposium on Image and Signal Processing andAnalysis (ISPA rsquo03) vol 2 pp 991ndash996 Rome Italy September2003

[31] M Azimi Data transmission schemes for a new generation ofinteractive digital television [PhD dissertation] Department ofElectrical and Computer EngineeringTheUniversity of BritishColumbia Vancouver Canada 2008

[32] M Azimi P Nasiopoulos and R K Ward ldquoOffline andonline identification of hidden semi-Markov modelsrdquo IEEETransactions on Signal Processing vol 53 no 8 pp 2658ndash26632005

[33] J Q Li and A R Barron ldquoMixture density estimationrdquo inAdvances in Neural Information Processing Systems 12 pp 279ndash285 MIT Press Boston Mass USA 1999

[34] M Dong and D He ldquoA segmental hidden semi-Markov model(HSMM)-based diagnostics and prognostics framework andmethodologyrdquoMechanical Systems and Signal Processing vol 21no 5 pp 2248ndash2266 2007

[35] T Liu J Chen and G Dong ldquoApplication of continuous hidemarkov model to bearing performance degradation assess-mentrdquo in Proceedings of the 24th International Congress onConditionMonitoring andDiagnostics EngineeringManagement(COMADEM rsquo11) pp 166ndash172 2011

[36] H Ocak and K A Loparo ldquoA new bearing fault detectionand diagnosis scheme based onhidden markov modeling ofvibration signalsrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech andSignal Processing pp 3141ndash3144 IEEE Computer Society Washington DC USA 2001

[37] C Fraley and A E Raftery ldquoModel-based clustering discrimi-nant analysis and density estimationrdquo Journal of the AmericanStatistical Association vol 97 no 458 pp 611ndash631 2002

[38] K L Nylund T Asparouhov and B O Muthen ldquoDeciding onthe number of classes in latent class analysis and growthmixturemodeling aMonte Carlo simulation studyrdquo Structural EquationModeling vol 14 no 4 pp 535ndash569 2007

[39] T H Lin and C M Dayton ldquoModel selection information cri-teria for non-nested latent class modelsrdquo Journal of Educationaland Behavioral Statistics vol 22 no 3 pp 249ndash264 1997

[40] O Lukociene and J K Vermunt ldquoDetermining the numberof components in mixture models for hierarchical datardquo inAdvances in Data Analysis Data Handling and Business Intel-ligence Studies in Classification Data Analysis and KnowledgeOrganization pp 241ndash249 Springer New York NY USA 2008

[41] O Cappe E Moulines and T Ryden Inference in HiddenMarkov Models Springer Series in Statistics Springer NewYork NY USA 2005

[42] I L MacDonald and W Zucchini Hidden Markov and OtherModels for Discrete-Valued Time Series Chapman amp HallCRC1997

Mathematical Problems in Engineering 23

[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002

[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986

[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993

[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006

[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973

[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977

[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012

[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985

[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015

[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013

[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013

[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013

[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013

[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012

[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012

[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model

and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013

[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 23: Hidden Semi-Markov Models for Predictive Maintenance

Mathematical Problems in Engineering 23

[43] R J MacKay ldquoEstimating the order of a hiddenMarkovmodelrdquoThe Canadian Journal of Statistics vol 30 no 4 pp 573ndash5892002

[44] S E Levinson ldquoContinuously variable duration hiddenMarkovmodels for automatic speechrecognitionrdquo Computer Speech andLanguage vol 1 no 1 pp 29ndash45 1986

[45] C D Mitchell and L H Jamieson ldquoModeling duration in ahidden Markov model with the exponential familyrdquo in Pro-ceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo93) vol 2 pp 331ndash334Minneapolis Minn USA April 1993

[46] A Viterbi ldquoError bounds for convolutional codes and anasymptotically optimum decoding algorithmrdquo IEEE Transac-tions on Information Theory vol 13 no 2 pp 260ndash269 2006

[47] GD Forney Jr ldquoThe viterbi algorithmrdquoProceedings of the IEEEvol 61 no 3 pp 268ndash278 1973

[48] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the em algorithmrdquo Journalof The Royal Statistical Society Series B vol 39 no 1 pp 1ndash381977

[49] P Nectoux R Gouriveau K Medjaher et al ldquoPronostia anexperimental platform for bearings accelerated life testrdquo inProceedings of the IEEE International Conference on Prognosticsand Health Management Denver Colo USA 2012

[50] P OrsquoDonnell ldquoReport of large motor reliability survey ofindustrial and commercial installations part I and IIrdquo IEEETransactions on Industry Applications vol 21 no 4 pp 853ndash8721985

[51] P Boskoski M Gasperin D Petelin and ETH Juricic ldquoBearingfault prognostics using Renyi entropy based features and Gaus-sian process modelsrdquoMechanical Systems and Signal Processingvol 52-53 pp 327ndash337 2015

[52] B Chouri F Montero M Tabaa and A Dandache ldquoResidualuseful life estimation based on stable distribution feature extrac-tion and SVM classifierrdquo Journal of Theoretical and AppliedInformation Technology vol 55 no 3 pp 299ndash306 2013

[53] K Javed R Gouriveau N Zerhouni and P Nectoux ldquoAfeature extraction procedure basedon trigonometric functionsand cumulative descriptors to enhance prognostics modelingrdquoin Proceedings of the IEEE Conference on Prognostics and HealthManagement (PHM rsquo13) pp 1ndash7 June 2013

[54] K Medjaher N Zerhouni and J Baklouti ldquoData-driven prog-nostics based on health indicatorconstruction application topronostiarsquos datardquo in Proceedings of the 12th European ControlConference (ECC rsquo13) pp 1451ndash1456 Zurich Switzerland July2013

[55] A Mosallam K Medjaher and N Zerhouni ldquoNonparametrictime series modelling for industrial prognostics and healthmanagementrdquo International Journal of AdvancedManufacturingTechnology vol 69 no 5ndash8 pp 1685ndash1699 2013

[56] S Porotsky and Z Bluvband ldquoRemaining useful life estimationfor systems with non-trendability behaviourrdquo in Proceedings ofthe IEEE Conference on Prognostics and Health Management(PHM rsquo12) pp 1ndash6 Denver Colo USA June 2012

[57] L Serir E Ramasso and N Zerhouni ldquoAn evidential evolvingprognostic approach and itsapplication to pronostias datastreamsrdquo in Annual Conference of the Prognostics and HealthManagement Society p 9 2012

[58] F SloukiaM El Aroussi HMedromi andMWahbi ldquoBearingsprognostic using mixtureof gaussians hidden Markov model

and support vector machinerdquo in Proceedings of the ACS Inter-national Conference on Computer Systems and Applications(AICCSA rsquo13) pp 1ndash4 May 2013

[59] B Zhang L Zhang and J Xu ldquoRemaining useful life predic-tion for rolling element bearing based on ensemble learningrdquoChemical Engineering Transactions vol 33 pp 157ndash162 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 24: Hidden Semi-Markov Models for Predictive Maintenance

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of