On Stochastic Dynamic Programming and its Application to Maintenance FRANC ¸OIS BESNARD Master’s Degree Project Stockholm, Sweden 2007
On Stochastic Dynamic Programming
and its Application to Maintenance
FRANCOIS BESNARD
Masterrsquos Degree Project
Stockholm Sweden 2007
On Stochastic Dynamic Programming and its
Application to Maintenance
MASTER THESIS BY FRANCcedilOIS BESNARD
Master Thesis written at the Royal Institute of TechnologyKTH School of Electrical Engineering June 2007
Supervisor Assistant Professor Lina Bertling (KTH) Professor Michael Patriksson(Chalmers Applied Mathematics) Dr Erik Dotzauer (Fortum)
Examiner Assistant Professor Lina Bertling
XR-EE-ETK 2007008
Abstract
The market and competition laws are introduced among power system companiesdue to the restructuration and deregulation of the power system The generat-ing companies as well as transmission and distribution system operators aim tominimize their costs Maintenance can be a significant part of the total costs Thepressure to reduce the maintenance budget leads to a need for efficient maintenance
This work focus on an optimization methodology that could be useful for optimizingmaintenance The method stochastic dynamic programming is interesting becauseit can integrate explicitely the stochastic behavior of functional failures
Different models based on stochastic dynamic programming are reviewed with thepossible optimization methods to solve them The interests of the models in the con-text of maintenance optimization are discussed An example on a multi-componentreplacement application is proposed to illustrate the theory
Keywords Maintenance Optimization Dynamic Programming Markov DecisionProcess Power Production
III
Acknowledgements
First of all I would like to thank my supervisors who each in their way supportedme in this work Ass Prof Lina Bertling for her encouragements constructiveremarks and for giving me the opportunity of working on this project Dr ErikDotzauer for many valuable inputs discussions and comments and Prof MichaelPatriksson for his help on mathematical writing
Special greetings to all my friends and companions of study all over the world
Finally my heart turns to my parents and my love for their endless encouragementsand support in my studies and life
Stockholm June 2007
V
Abreviations
ADP Approximate Dynamic ProgrammingCBM Condition Based MaintenanceCM Corrective MaintenanceDP Dynamic ProgrammingIHSDP Infinite Horizon Stochastic Dynamic ProgrammingLP Linear ProgrammingMDP Markov Decision ProcessPI Policy IterationPM Preventive MaintenanceRCAM Reliability Centered Asset MaintenanceRCM Reliability Centered MaintenanceSDP Stochastic Dynamic ProgrammingSMDP Semi-Markov Decision ProcessTBM Time Based MaintenanceVI Value Iteration
VII
Notations
NumbersM Number of iteration for the evaluation step of modified policy iterationN Number of stages
Constantα Discount factor ll
Variablesi State at the current stagej State at the next stagek Stagem Number of iteration left for the evaluation step of modified policy iterationq Iteration number for the policy iteration algorithmu Decision variable
State and Control Spacemicrok Function mapping the states with a decisionmicrolowastk(i) Optimal decision at state k for state imicro Decision policy for stationary systemsmicrolowast Optimal decision policy for stationary systemsπ Policyπlowast Optimal policyUk Decision action at stage kUlowastk (i) Optimal decision action at stage k for state iXk State at stage k
Dynamic and Cost functionsCk(i u) Cost functionCk(i u j) Cost functionCij(u) = C(i u j) Cost function if the system is stationaryCN (i) Terminal cost for state ifk(i u) Dynamic functionfk(i u ω) Stochastic dynamic functionJlowastk (i) Optimal cost-to-go from stage k to N starting from state iωk(i u) Probabilistic function of a disturbances Pk(j u i) Transition probability functionP (j u i) Transition probability function for stationary systemsV (Xk) Cost-to-go resulting of a trajectory starting from state Xk
Sets
IX
ΩUk (i) Decision Space at stage k for state iΩXk State space at stage k
Contents
Contents XI
1 Introduction 1
11 Background 1
12 Objective 2
13 Approach 2
14 Outline 2
2 Maintenance 5
21 Types of Maintenance 5
22 Maintenance Optimization Models 6
3 Introduction to the Power System 11
31 Power System Presentation 11
32 Costs 13
33 Main Constraints 13
4 Introduction to Dynamic Programming 15
41 Introduction 15
42 Deterministic Dynamic Programming 18
5 Finite Horizon Models 23
51 Problem Formulation 23
52 Optimality Equation 25
53 Value Iteration Method 25
54 The Curse of Dimensionality 26
55 Ideas for a Maintenance Optimization Model 26
6 Infinite Horizon Models - Markov Decision Processes 29
61 Problem Formulation 29
62 Optimality Equations 31
63 Value Iteration 31
64 The Policy Iteration Algorithm 31
65 Modified Policy Iteration 32
66 Average Cost-to-go Problems 33
XI
67 Linear Programming 3468 Efficiency of the Algorithms 3569 Semi-Markov Decision Process 35
7 Approximate Methods for Markov Decision Process - Reinforcement Learning 3771 Introduction 3772 Direct Learning 3873 Indirect Learning 4174 Supervised Learning 42
8 Review of Models for Maintenance Optimization 4381 Finite Horizon Dynamic Programming 4382 Infinite Horizon Stochastic Models 4483 Reinforcement Learning 4584 Conclusions 45
9 A Proposed Finite Horizon Replacement Model 4791 One-Component Model 4792 Multi-Component model 5593 Possible Extensions 59
10 Conclusions and Future Work 61
A Solution of the Shortest Path Example 63
Reference List 65
Chapter 1
Introduction
11 Background
The market and competition laws are introduced among power system companiesdue to the restructuration and deregulation of modern power system The gen-erating companies as well as transmission and distribution system operators aimto minimize their costs Maintenance costs can be a significant part of the totalcosts The pressure to reduce the maintenance budget leads to a need for efficientmaintenance
Maintenance cost be divided into Corrective Maintenance (CM) and PreventiveMaintenance (PM) (see Chapter 21)
CM means that an asset is maintained once an unscheduled functionnal failureoccurs CM can imply high costs for unsupplied energy interruption possible de-terioration of the system human risks or environment consequences etc
PM is employed to reduce the risk of unexpected failure Time Based Maintenance(TBM) is used for the most critical components and Condition Based Maintenance(CBM) for the components that are worth and not too expensive to monitoreThese maintenance actions have a cost for unsupplied energy inspection repairreplacement etc
An efficient maintenance should balance the corrective and preventive maintenanceto minimize the total costs of maintenance
The probability of a functionnal failure for a component is stochastic The probabil-ity depends on the state of component resulting from the history of the component(age intensity of use external stress (such as weather) maintenance actions human
1
errors and construction errors) Stochastic Dynamic Programming (SDP) modelsare optimization models that integrate explicitely stochastic behaviors This featuremakes the models interesting and was the starting idea of this work
12 Objective
The main objective of this work is to investigate the use of stochastic dynamicprogramming models for maintenance optimization and identify possible future ap-plications in power systems
13 Approach
The first task was to understand the different dynamic programming approachesA first distinction was made between finite horizon and infinite horizon approaches
The different techniques that can be used for solving a model based on dynamicprogramming was investigated For infinite horizon models approximate dynamicprogramming was studied These types of methods are related to the field of rein-forcement learning
Some SDP models found in the literature was reviewed Conclusions was madeabout the applicability of each approach for maintenance optimization problemsMoreover future avenue for research was identified
A finite horizon replacement model was developed to illustrate the possible use ofSDP for power system maintenance
14 Outline
Chapter 2 solves an overview of the maintenance field The most important methodsand some optimization models are reviewed
Chapter 3 discusses shortly power systems Some costs and constraints for opti-mization models are proposed
Chapter 4-7 focus on different Dynamic Programming (DP) approaches and al-gorithms to solve them The assumption of the models and practical limitationsare discussed The basic of DP models is investigated in deterministic models inChapter 4 Chapter 5 and 6 focus on Stochastic Dynamic Programming methods
2
respectively for finite and infinite horizons Chapter 7 is an introduction to Approx-imate Dynamic Programming (ADP) also known as Reinforcement Learning (RL)which is an approach to solving Dynamic Programming infinite horizon problemsusing approximate methods
Chapter 8 gives a review of some maintenance optimization models based on dy-namic programming Conclusions are made about possible use of the differentapproaches in maintenance optimization
Chapter 9 is an example of how finite horizon dynamic programming can be usedfor maintenance optimization
Chapter 10 summarizes the conlusions of the work and discuss possible avenues forresearch
3
Chapter 2
Maintenance
The context of maintenance optimization is shortly described in this chapter Differ-ent types of maintenance are defined in Section 21 Some maintenance optimizationmodels are reviewed in Section 22
21 Types of Maintenance
Maintenance is a combination of all technical administrative and managerial actionsduring the life cycle of an item intended to retain it or restore it to a state in whichit can perform the required functions [1] Figure 21 shows a general picture of thedifferent types of maintenance
Corrective Maintenance (CM) is carried out after fault recognition and intendedto put an item into a state in which it can perform a required function [1] It istypically performed in case there is no way or it is not worth detecting or preventinga failure
Preventive maintenance aims at undertaking maintenance actions on a componentbefore it fails to eg avoid high cost of replacement power delivery unsuppliedand possible damages of the surrounding of the component One can distinguishbetween two kind of preventive maintenance
1 Time Based Maintenance (TBM) is preventive maintenance carried out inaccordance with established intervals of time or number of units of use butwithout previous condition investigation [1] TBM is used for failures that areage-related and for which the probability of failure on time can be established
5
Maintenance
Preventive Maintenance
Time-Based Maintenance (TBM) Condition Based Maintenance (CBM)
Continuous Schedulled Inspection Based
Corrective Maintenance
Figure 21 Maintenance Tree based on [1]
2 Condition Based Maintenance is preventive maintenance based on perfor-mance andor parameter monitoring and the subsequent actions [1] PMcorresponds to all the maintenance methods using diagnostic or inspectionsto decide of the maintenance actions Diagnostic methods include the use ofhuman senses (noise visual etc) measurements or tests They can be un-dertaken continuously or during schedulled or requested inspections CBM isoften used for non-age related failures
22 Maintenance Optimization Models
Unexpected failures of a component in a system can lead to expensive CorrectiveMaintenance Preventive Maintenance approaches can be used to avoid CM Ifpreventive maintenance is done too frequently it can however also result in a veryhigh cost
The aim of the maintenance optimization could be to balance corrective and pre-ventive maintenance to minimize for example the total cost of maintenance
Numerous maintenance optimization models have been proposed in the litteratureand interesting reviews have been published Wang [43] gives an interesting pictureof maintenance policy optimization and its influence factors Cho et al [15]Dekker et al [16] and Nicolai et al [31] focus mainly on multi-componentproblems
In this section the most common classes of models are described and some referencesare given This short review is based on Chapter 8 of [4]
6
221 Age Replacement Policies
Under an age replacement policy a component is replace at failure or at the end ofa specified interval whichever occurs first [17] This policy makes sens if preventivereplacement is less expensive than a corrective replacement and the failure rateincrease with time Barlow et al [7] describes a basic age replacement model
A model including discount have been proposed in [17] In this model the loss valueof a replaced component decreases with its age
A model with minimal repair is discussed in [6] If the component fails it can berepaired to the same condition as before the failure occured
An ageblock replacement model with failures resulting from shocks is described in[38] The shocks follows a non-homogeneous Poisson distribution (Poisson processwith a rate that is not stationnary) Two types of failures can result from the shocksminor failure removed by minor repair and major failure removed by replacement
222 Block Replacement Policies
In blocks replacement policies the components of a system are replaced at failureor at fixed times kT (k = 1 2 ) whichever occurs first Barlow et al [7] describesa basic block replacement model To avoid that a component that has just beenreplaced is replaced again a modified block replacement model is proposed in [10]A component is not replaced at a schedulled replacement time if its age is less thanT
This model has been modified in [11] to model that the operational cost of an unitis higher when it becomes older Moreover the model of [10] is extended in [5] toallow multi-component systems with any discrete lifetime distribution
223 Condition Based Maintenance
CBM is being introduced in many systems to avoid unnecessary maintenance andprevent incipient failure In wind turbines condition monitoring is being intro-duced for components like the gear box blades etc [32] One problem prior to theoptimization is to identify relevant variables and identify their relation with failuresmodes and probabilities CBM optimization models focus on different questionsrelated to inspectedmonitored components
One question is the optimal limits for the monitored variables above which it is nec-essary to perform maintenance The optimal wear-limit for preventive replacement
7
of a component is derived in [34] The model is extended in [35] to include differentmonitoring variables
For components subject to inspection at each decision epoch one must decide ifmaintenance should be performed and when the next inspection should occur In[2] the inspection occur at fixed time and the decision of preventive replacementof the component depend on its condition at inspection In [9] a Semi-MarkovDecision Process (SMDP see Chapter 4) is proposed to optimize at each inspectionthe maintenance decision and the time to next inspection
An age replacement policies model that takes into account the information fromcondition based monitoring devices is proposed in [25] A proportional hazardmodel is used to model the effect of the monitored variables The assumption ofa hazard model is that the hazard function is the product of a two functions onedepending on the time and one on the parameters (monitored variables)
224 Opportunistic Maintenance Models
Opportunistics maintenance considers unexpected opportunities of performing pre-ventive maintenance With the failure of a component it is possible to perform PMon other components This could be interesting for offshore wind farms for exampleThe deplacement to the wind farm by boat or helicopter is necessary and can bevery expensive By grouping maintenance actions money could be saved
Haurie et al [19] focus on group preventive replacement policy of m identicalcomponents that are in the same condition Both discrete and continuous time areconsidered and a dynamic programming equation is derived The model is extendedin [26] for m non-identical components
A rolling horizon dynamic programming algorithm is proposed in [45] to take intoaccount the short term information The model can be used for many maintenanceoptimization models
225 Other Types of Models and Criteria of Classifications
Other models integrate the possibility of a limited number of spare parts or a possi-ble choice between different spare part Eg cannibalization models allows the re-useof some components or subcomponents of a system
Other criterias can be used to classify maintenance optimization models The num-ber of components in consideration is important eg multi-components modelsare more interesting in power system The time horizon considered in the model
8
is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used
The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions
9
Chapter 3
Introduction to the Power
System
This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed
31 Power System Presentation
Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents
311 Power System Description
A simple description of the power system include the following main parts
1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated
2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units
11
3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)
4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage
The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition
The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs
312 Maintenance in Power System
The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined
Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])
Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more
12
attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)
The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems
32 Costs
Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows
bull Manpower cost Cost for the maintenance team that performs maintenanceactions
bull Spare part cost The cost of a new component is an important part of themaintenance cost
bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine
bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency
bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure
bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)
33 Main Constraints
Possibles constraints for the maintenance of power system have been identified asfollows
13
bull Manpower The size and availability of the maintenance staff is limited
bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available
bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms
bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time
bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model
bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model
bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model
14
Chapter 4
Introduction to Dynamic
Programming
This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels
41 Introduction
Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system
The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions
In the following important ideas concerning Dynamic Programming are discussed
411 Principle of Optimality
Dynamic programming is a way of decomposing a large problem into subproblems
It can be applied to any problem that observes the principle of optimality
15
An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]
The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions
Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed
412 Deterministic and Stochastic Models
A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made
If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic
Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting
413 Time Horizon
The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons
Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered
Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long
16
414 Decision Time
In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result
Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals
Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here
415 Exact and Approximation Methods
Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)
Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods
Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]
17
42 Deterministic Dynamic Programming
This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem
421 Problem Formulation
The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages
State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)
Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )
Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system
Jlowast0 (X0) = minUk
Nminus1sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
18
422 The Optimality Equation and Value Iteration Algorithm
The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)
Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i
The value iteration algorithm is a direct consequence of the optimality equation
JlowastN (i) = CN (i) foralli isin XN
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
Ulowastk (i) = argminuisinΩU
k(i)
Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
u Decision variableUlowastk (i) Optimal decision action at stage k for state i
lll
The algorithm goes backwards starting from the last stage It stops when k=0
19
423 A Simple Shortest Path Problem Example
Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space
An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered
B E H
A C F I K
D G J
Stage 0 Stage 1 Stage 2 Stage 3 Stage 4
2
4
3
4
62
1
35
2
2
57
3
21
2
4
2
7
The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost
Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K
4231 Problem Formulation
The problem is divided into five stagesn=5 k=01234
State SpaceThe state space is defined for each stage
ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2
ΩX3 = H I J = 0 1 2ΩX4 = K = 0
20
Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector
Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used
ΩUk (i) =
0 1 for i = 00 1 2 for i = 11 2 for i = 2
for k=123
ΩU0 (0) = 0 1 2 for k=0
For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F
Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G
A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro
lowast1 micro
lowastN
Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u
The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states
Objective Function
Jlowast0 (0) = minUkisinΩU
k(Xk)
4sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1
4232 Solution
The value iteration algorithm is used to solve the problem
The algorithm is initiated from the last stage and then iterated backwards until
21
the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited
The solution of the algorithm are given in Appendix A
The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)
22
Chapter 5
Finite Horizon Models
In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended
51 Problem Formulation
Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system
A stochastic dynamic programming model can be formulated as below
State Space
A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable
The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk
Decision Space
At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on
23
the stage u isin ΩUk (i)
Dynamic of the System and Transition Probability
On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)
Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1
The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage
Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)
If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified
P (j u i) = P (Xk+1 = j | Xk = i Uk = u)
In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)
Cost Function
A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage
Ck(j u i) = Ck(xk+1 = j uk = u xk = i)
If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)
A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate
Objective Function
The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem
Jlowast(X0) = minUkisinΩU
k(Xk)ECN (XN ) +
Nminus1sumk=0Ck(Xk+1 Uk Xk)
Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1
24
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
52 Optimality Equation
The optimality equation for stochastic finite horizon DP is
Jlowastk (i) = minuisinΩU
k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)
This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions
Jlowastk (i) = minuisinΩU
k(i)
sum
jisinΩXk+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)
ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function
53 Value Iteration Method
The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system
JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)
While k ge 0 doJlowastk (i) = min
uisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk
Ulowastk (i) = argminuisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN
k larr k minus 1
25
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
On Stochastic Dynamic Programming and its
Application to Maintenance
MASTER THESIS BY FRANCcedilOIS BESNARD
Master Thesis written at the Royal Institute of TechnologyKTH School of Electrical Engineering June 2007
Supervisor Assistant Professor Lina Bertling (KTH) Professor Michael Patriksson(Chalmers Applied Mathematics) Dr Erik Dotzauer (Fortum)
Examiner Assistant Professor Lina Bertling
XR-EE-ETK 2007008
Abstract
The market and competition laws are introduced among power system companiesdue to the restructuration and deregulation of the power system The generat-ing companies as well as transmission and distribution system operators aim tominimize their costs Maintenance can be a significant part of the total costs Thepressure to reduce the maintenance budget leads to a need for efficient maintenance
This work focus on an optimization methodology that could be useful for optimizingmaintenance The method stochastic dynamic programming is interesting becauseit can integrate explicitely the stochastic behavior of functional failures
Different models based on stochastic dynamic programming are reviewed with thepossible optimization methods to solve them The interests of the models in the con-text of maintenance optimization are discussed An example on a multi-componentreplacement application is proposed to illustrate the theory
Keywords Maintenance Optimization Dynamic Programming Markov DecisionProcess Power Production
III
Acknowledgements
First of all I would like to thank my supervisors who each in their way supportedme in this work Ass Prof Lina Bertling for her encouragements constructiveremarks and for giving me the opportunity of working on this project Dr ErikDotzauer for many valuable inputs discussions and comments and Prof MichaelPatriksson for his help on mathematical writing
Special greetings to all my friends and companions of study all over the world
Finally my heart turns to my parents and my love for their endless encouragementsand support in my studies and life
Stockholm June 2007
V
Abreviations
ADP Approximate Dynamic ProgrammingCBM Condition Based MaintenanceCM Corrective MaintenanceDP Dynamic ProgrammingIHSDP Infinite Horizon Stochastic Dynamic ProgrammingLP Linear ProgrammingMDP Markov Decision ProcessPI Policy IterationPM Preventive MaintenanceRCAM Reliability Centered Asset MaintenanceRCM Reliability Centered MaintenanceSDP Stochastic Dynamic ProgrammingSMDP Semi-Markov Decision ProcessTBM Time Based MaintenanceVI Value Iteration
VII
Notations
NumbersM Number of iteration for the evaluation step of modified policy iterationN Number of stages
Constantα Discount factor ll
Variablesi State at the current stagej State at the next stagek Stagem Number of iteration left for the evaluation step of modified policy iterationq Iteration number for the policy iteration algorithmu Decision variable
State and Control Spacemicrok Function mapping the states with a decisionmicrolowastk(i) Optimal decision at state k for state imicro Decision policy for stationary systemsmicrolowast Optimal decision policy for stationary systemsπ Policyπlowast Optimal policyUk Decision action at stage kUlowastk (i) Optimal decision action at stage k for state iXk State at stage k
Dynamic and Cost functionsCk(i u) Cost functionCk(i u j) Cost functionCij(u) = C(i u j) Cost function if the system is stationaryCN (i) Terminal cost for state ifk(i u) Dynamic functionfk(i u ω) Stochastic dynamic functionJlowastk (i) Optimal cost-to-go from stage k to N starting from state iωk(i u) Probabilistic function of a disturbances Pk(j u i) Transition probability functionP (j u i) Transition probability function for stationary systemsV (Xk) Cost-to-go resulting of a trajectory starting from state Xk
Sets
IX
ΩUk (i) Decision Space at stage k for state iΩXk State space at stage k
Contents
Contents XI
1 Introduction 1
11 Background 1
12 Objective 2
13 Approach 2
14 Outline 2
2 Maintenance 5
21 Types of Maintenance 5
22 Maintenance Optimization Models 6
3 Introduction to the Power System 11
31 Power System Presentation 11
32 Costs 13
33 Main Constraints 13
4 Introduction to Dynamic Programming 15
41 Introduction 15
42 Deterministic Dynamic Programming 18
5 Finite Horizon Models 23
51 Problem Formulation 23
52 Optimality Equation 25
53 Value Iteration Method 25
54 The Curse of Dimensionality 26
55 Ideas for a Maintenance Optimization Model 26
6 Infinite Horizon Models - Markov Decision Processes 29
61 Problem Formulation 29
62 Optimality Equations 31
63 Value Iteration 31
64 The Policy Iteration Algorithm 31
65 Modified Policy Iteration 32
66 Average Cost-to-go Problems 33
XI
67 Linear Programming 3468 Efficiency of the Algorithms 3569 Semi-Markov Decision Process 35
7 Approximate Methods for Markov Decision Process - Reinforcement Learning 3771 Introduction 3772 Direct Learning 3873 Indirect Learning 4174 Supervised Learning 42
8 Review of Models for Maintenance Optimization 4381 Finite Horizon Dynamic Programming 4382 Infinite Horizon Stochastic Models 4483 Reinforcement Learning 4584 Conclusions 45
9 A Proposed Finite Horizon Replacement Model 4791 One-Component Model 4792 Multi-Component model 5593 Possible Extensions 59
10 Conclusions and Future Work 61
A Solution of the Shortest Path Example 63
Reference List 65
Chapter 1
Introduction
11 Background
The market and competition laws are introduced among power system companiesdue to the restructuration and deregulation of modern power system The gen-erating companies as well as transmission and distribution system operators aimto minimize their costs Maintenance costs can be a significant part of the totalcosts The pressure to reduce the maintenance budget leads to a need for efficientmaintenance
Maintenance cost be divided into Corrective Maintenance (CM) and PreventiveMaintenance (PM) (see Chapter 21)
CM means that an asset is maintained once an unscheduled functionnal failureoccurs CM can imply high costs for unsupplied energy interruption possible de-terioration of the system human risks or environment consequences etc
PM is employed to reduce the risk of unexpected failure Time Based Maintenance(TBM) is used for the most critical components and Condition Based Maintenance(CBM) for the components that are worth and not too expensive to monitoreThese maintenance actions have a cost for unsupplied energy inspection repairreplacement etc
An efficient maintenance should balance the corrective and preventive maintenanceto minimize the total costs of maintenance
The probability of a functionnal failure for a component is stochastic The probabil-ity depends on the state of component resulting from the history of the component(age intensity of use external stress (such as weather) maintenance actions human
1
errors and construction errors) Stochastic Dynamic Programming (SDP) modelsare optimization models that integrate explicitely stochastic behaviors This featuremakes the models interesting and was the starting idea of this work
12 Objective
The main objective of this work is to investigate the use of stochastic dynamicprogramming models for maintenance optimization and identify possible future ap-plications in power systems
13 Approach
The first task was to understand the different dynamic programming approachesA first distinction was made between finite horizon and infinite horizon approaches
The different techniques that can be used for solving a model based on dynamicprogramming was investigated For infinite horizon models approximate dynamicprogramming was studied These types of methods are related to the field of rein-forcement learning
Some SDP models found in the literature was reviewed Conclusions was madeabout the applicability of each approach for maintenance optimization problemsMoreover future avenue for research was identified
A finite horizon replacement model was developed to illustrate the possible use ofSDP for power system maintenance
14 Outline
Chapter 2 solves an overview of the maintenance field The most important methodsand some optimization models are reviewed
Chapter 3 discusses shortly power systems Some costs and constraints for opti-mization models are proposed
Chapter 4-7 focus on different Dynamic Programming (DP) approaches and al-gorithms to solve them The assumption of the models and practical limitationsare discussed The basic of DP models is investigated in deterministic models inChapter 4 Chapter 5 and 6 focus on Stochastic Dynamic Programming methods
2
respectively for finite and infinite horizons Chapter 7 is an introduction to Approx-imate Dynamic Programming (ADP) also known as Reinforcement Learning (RL)which is an approach to solving Dynamic Programming infinite horizon problemsusing approximate methods
Chapter 8 gives a review of some maintenance optimization models based on dy-namic programming Conclusions are made about possible use of the differentapproaches in maintenance optimization
Chapter 9 is an example of how finite horizon dynamic programming can be usedfor maintenance optimization
Chapter 10 summarizes the conlusions of the work and discuss possible avenues forresearch
3
Chapter 2
Maintenance
The context of maintenance optimization is shortly described in this chapter Differ-ent types of maintenance are defined in Section 21 Some maintenance optimizationmodels are reviewed in Section 22
21 Types of Maintenance
Maintenance is a combination of all technical administrative and managerial actionsduring the life cycle of an item intended to retain it or restore it to a state in whichit can perform the required functions [1] Figure 21 shows a general picture of thedifferent types of maintenance
Corrective Maintenance (CM) is carried out after fault recognition and intendedto put an item into a state in which it can perform a required function [1] It istypically performed in case there is no way or it is not worth detecting or preventinga failure
Preventive maintenance aims at undertaking maintenance actions on a componentbefore it fails to eg avoid high cost of replacement power delivery unsuppliedand possible damages of the surrounding of the component One can distinguishbetween two kind of preventive maintenance
1 Time Based Maintenance (TBM) is preventive maintenance carried out inaccordance with established intervals of time or number of units of use butwithout previous condition investigation [1] TBM is used for failures that areage-related and for which the probability of failure on time can be established
5
Maintenance
Preventive Maintenance
Time-Based Maintenance (TBM) Condition Based Maintenance (CBM)
Continuous Schedulled Inspection Based
Corrective Maintenance
Figure 21 Maintenance Tree based on [1]
2 Condition Based Maintenance is preventive maintenance based on perfor-mance andor parameter monitoring and the subsequent actions [1] PMcorresponds to all the maintenance methods using diagnostic or inspectionsto decide of the maintenance actions Diagnostic methods include the use ofhuman senses (noise visual etc) measurements or tests They can be un-dertaken continuously or during schedulled or requested inspections CBM isoften used for non-age related failures
22 Maintenance Optimization Models
Unexpected failures of a component in a system can lead to expensive CorrectiveMaintenance Preventive Maintenance approaches can be used to avoid CM Ifpreventive maintenance is done too frequently it can however also result in a veryhigh cost
The aim of the maintenance optimization could be to balance corrective and pre-ventive maintenance to minimize for example the total cost of maintenance
Numerous maintenance optimization models have been proposed in the litteratureand interesting reviews have been published Wang [43] gives an interesting pictureof maintenance policy optimization and its influence factors Cho et al [15]Dekker et al [16] and Nicolai et al [31] focus mainly on multi-componentproblems
In this section the most common classes of models are described and some referencesare given This short review is based on Chapter 8 of [4]
6
221 Age Replacement Policies
Under an age replacement policy a component is replace at failure or at the end ofa specified interval whichever occurs first [17] This policy makes sens if preventivereplacement is less expensive than a corrective replacement and the failure rateincrease with time Barlow et al [7] describes a basic age replacement model
A model including discount have been proposed in [17] In this model the loss valueof a replaced component decreases with its age
A model with minimal repair is discussed in [6] If the component fails it can berepaired to the same condition as before the failure occured
An ageblock replacement model with failures resulting from shocks is described in[38] The shocks follows a non-homogeneous Poisson distribution (Poisson processwith a rate that is not stationnary) Two types of failures can result from the shocksminor failure removed by minor repair and major failure removed by replacement
222 Block Replacement Policies
In blocks replacement policies the components of a system are replaced at failureor at fixed times kT (k = 1 2 ) whichever occurs first Barlow et al [7] describesa basic block replacement model To avoid that a component that has just beenreplaced is replaced again a modified block replacement model is proposed in [10]A component is not replaced at a schedulled replacement time if its age is less thanT
This model has been modified in [11] to model that the operational cost of an unitis higher when it becomes older Moreover the model of [10] is extended in [5] toallow multi-component systems with any discrete lifetime distribution
223 Condition Based Maintenance
CBM is being introduced in many systems to avoid unnecessary maintenance andprevent incipient failure In wind turbines condition monitoring is being intro-duced for components like the gear box blades etc [32] One problem prior to theoptimization is to identify relevant variables and identify their relation with failuresmodes and probabilities CBM optimization models focus on different questionsrelated to inspectedmonitored components
One question is the optimal limits for the monitored variables above which it is nec-essary to perform maintenance The optimal wear-limit for preventive replacement
7
of a component is derived in [34] The model is extended in [35] to include differentmonitoring variables
For components subject to inspection at each decision epoch one must decide ifmaintenance should be performed and when the next inspection should occur In[2] the inspection occur at fixed time and the decision of preventive replacementof the component depend on its condition at inspection In [9] a Semi-MarkovDecision Process (SMDP see Chapter 4) is proposed to optimize at each inspectionthe maintenance decision and the time to next inspection
An age replacement policies model that takes into account the information fromcondition based monitoring devices is proposed in [25] A proportional hazardmodel is used to model the effect of the monitored variables The assumption ofa hazard model is that the hazard function is the product of a two functions onedepending on the time and one on the parameters (monitored variables)
224 Opportunistic Maintenance Models
Opportunistics maintenance considers unexpected opportunities of performing pre-ventive maintenance With the failure of a component it is possible to perform PMon other components This could be interesting for offshore wind farms for exampleThe deplacement to the wind farm by boat or helicopter is necessary and can bevery expensive By grouping maintenance actions money could be saved
Haurie et al [19] focus on group preventive replacement policy of m identicalcomponents that are in the same condition Both discrete and continuous time areconsidered and a dynamic programming equation is derived The model is extendedin [26] for m non-identical components
A rolling horizon dynamic programming algorithm is proposed in [45] to take intoaccount the short term information The model can be used for many maintenanceoptimization models
225 Other Types of Models and Criteria of Classifications
Other models integrate the possibility of a limited number of spare parts or a possi-ble choice between different spare part Eg cannibalization models allows the re-useof some components or subcomponents of a system
Other criterias can be used to classify maintenance optimization models The num-ber of components in consideration is important eg multi-components modelsare more interesting in power system The time horizon considered in the model
8
is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used
The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions
9
Chapter 3
Introduction to the Power
System
This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed
31 Power System Presentation
Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents
311 Power System Description
A simple description of the power system include the following main parts
1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated
2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units
11
3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)
4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage
The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition
The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs
312 Maintenance in Power System
The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined
Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])
Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more
12
attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)
The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems
32 Costs
Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows
bull Manpower cost Cost for the maintenance team that performs maintenanceactions
bull Spare part cost The cost of a new component is an important part of themaintenance cost
bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine
bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency
bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure
bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)
33 Main Constraints
Possibles constraints for the maintenance of power system have been identified asfollows
13
bull Manpower The size and availability of the maintenance staff is limited
bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available
bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms
bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time
bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model
bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model
bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model
14
Chapter 4
Introduction to Dynamic
Programming
This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels
41 Introduction
Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system
The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions
In the following important ideas concerning Dynamic Programming are discussed
411 Principle of Optimality
Dynamic programming is a way of decomposing a large problem into subproblems
It can be applied to any problem that observes the principle of optimality
15
An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]
The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions
Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed
412 Deterministic and Stochastic Models
A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made
If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic
Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting
413 Time Horizon
The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons
Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered
Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long
16
414 Decision Time
In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result
Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals
Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here
415 Exact and Approximation Methods
Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)
Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods
Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]
17
42 Deterministic Dynamic Programming
This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem
421 Problem Formulation
The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages
State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)
Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )
Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system
Jlowast0 (X0) = minUk
Nminus1sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
18
422 The Optimality Equation and Value Iteration Algorithm
The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)
Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i
The value iteration algorithm is a direct consequence of the optimality equation
JlowastN (i) = CN (i) foralli isin XN
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
Ulowastk (i) = argminuisinΩU
k(i)
Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
u Decision variableUlowastk (i) Optimal decision action at stage k for state i
lll
The algorithm goes backwards starting from the last stage It stops when k=0
19
423 A Simple Shortest Path Problem Example
Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space
An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered
B E H
A C F I K
D G J
Stage 0 Stage 1 Stage 2 Stage 3 Stage 4
2
4
3
4
62
1
35
2
2
57
3
21
2
4
2
7
The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost
Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K
4231 Problem Formulation
The problem is divided into five stagesn=5 k=01234
State SpaceThe state space is defined for each stage
ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2
ΩX3 = H I J = 0 1 2ΩX4 = K = 0
20
Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector
Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used
ΩUk (i) =
0 1 for i = 00 1 2 for i = 11 2 for i = 2
for k=123
ΩU0 (0) = 0 1 2 for k=0
For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F
Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G
A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro
lowast1 micro
lowastN
Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u
The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states
Objective Function
Jlowast0 (0) = minUkisinΩU
k(Xk)
4sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1
4232 Solution
The value iteration algorithm is used to solve the problem
The algorithm is initiated from the last stage and then iterated backwards until
21
the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited
The solution of the algorithm are given in Appendix A
The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)
22
Chapter 5
Finite Horizon Models
In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended
51 Problem Formulation
Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system
A stochastic dynamic programming model can be formulated as below
State Space
A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable
The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk
Decision Space
At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on
23
the stage u isin ΩUk (i)
Dynamic of the System and Transition Probability
On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)
Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1
The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage
Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)
If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified
P (j u i) = P (Xk+1 = j | Xk = i Uk = u)
In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)
Cost Function
A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage
Ck(j u i) = Ck(xk+1 = j uk = u xk = i)
If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)
A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate
Objective Function
The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem
Jlowast(X0) = minUkisinΩU
k(Xk)ECN (XN ) +
Nminus1sumk=0Ck(Xk+1 Uk Xk)
Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1
24
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
52 Optimality Equation
The optimality equation for stochastic finite horizon DP is
Jlowastk (i) = minuisinΩU
k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)
This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions
Jlowastk (i) = minuisinΩU
k(i)
sum
jisinΩXk+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)
ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function
53 Value Iteration Method
The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system
JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)
While k ge 0 doJlowastk (i) = min
uisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk
Ulowastk (i) = argminuisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN
k larr k minus 1
25
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
Abstract
The market and competition laws are introduced among power system companiesdue to the restructuration and deregulation of the power system The generat-ing companies as well as transmission and distribution system operators aim tominimize their costs Maintenance can be a significant part of the total costs Thepressure to reduce the maintenance budget leads to a need for efficient maintenance
This work focus on an optimization methodology that could be useful for optimizingmaintenance The method stochastic dynamic programming is interesting becauseit can integrate explicitely the stochastic behavior of functional failures
Different models based on stochastic dynamic programming are reviewed with thepossible optimization methods to solve them The interests of the models in the con-text of maintenance optimization are discussed An example on a multi-componentreplacement application is proposed to illustrate the theory
Keywords Maintenance Optimization Dynamic Programming Markov DecisionProcess Power Production
III
Acknowledgements
First of all I would like to thank my supervisors who each in their way supportedme in this work Ass Prof Lina Bertling for her encouragements constructiveremarks and for giving me the opportunity of working on this project Dr ErikDotzauer for many valuable inputs discussions and comments and Prof MichaelPatriksson for his help on mathematical writing
Special greetings to all my friends and companions of study all over the world
Finally my heart turns to my parents and my love for their endless encouragementsand support in my studies and life
Stockholm June 2007
V
Abreviations
ADP Approximate Dynamic ProgrammingCBM Condition Based MaintenanceCM Corrective MaintenanceDP Dynamic ProgrammingIHSDP Infinite Horizon Stochastic Dynamic ProgrammingLP Linear ProgrammingMDP Markov Decision ProcessPI Policy IterationPM Preventive MaintenanceRCAM Reliability Centered Asset MaintenanceRCM Reliability Centered MaintenanceSDP Stochastic Dynamic ProgrammingSMDP Semi-Markov Decision ProcessTBM Time Based MaintenanceVI Value Iteration
VII
Notations
NumbersM Number of iteration for the evaluation step of modified policy iterationN Number of stages
Constantα Discount factor ll
Variablesi State at the current stagej State at the next stagek Stagem Number of iteration left for the evaluation step of modified policy iterationq Iteration number for the policy iteration algorithmu Decision variable
State and Control Spacemicrok Function mapping the states with a decisionmicrolowastk(i) Optimal decision at state k for state imicro Decision policy for stationary systemsmicrolowast Optimal decision policy for stationary systemsπ Policyπlowast Optimal policyUk Decision action at stage kUlowastk (i) Optimal decision action at stage k for state iXk State at stage k
Dynamic and Cost functionsCk(i u) Cost functionCk(i u j) Cost functionCij(u) = C(i u j) Cost function if the system is stationaryCN (i) Terminal cost for state ifk(i u) Dynamic functionfk(i u ω) Stochastic dynamic functionJlowastk (i) Optimal cost-to-go from stage k to N starting from state iωk(i u) Probabilistic function of a disturbances Pk(j u i) Transition probability functionP (j u i) Transition probability function for stationary systemsV (Xk) Cost-to-go resulting of a trajectory starting from state Xk
Sets
IX
ΩUk (i) Decision Space at stage k for state iΩXk State space at stage k
Contents
Contents XI
1 Introduction 1
11 Background 1
12 Objective 2
13 Approach 2
14 Outline 2
2 Maintenance 5
21 Types of Maintenance 5
22 Maintenance Optimization Models 6
3 Introduction to the Power System 11
31 Power System Presentation 11
32 Costs 13
33 Main Constraints 13
4 Introduction to Dynamic Programming 15
41 Introduction 15
42 Deterministic Dynamic Programming 18
5 Finite Horizon Models 23
51 Problem Formulation 23
52 Optimality Equation 25
53 Value Iteration Method 25
54 The Curse of Dimensionality 26
55 Ideas for a Maintenance Optimization Model 26
6 Infinite Horizon Models - Markov Decision Processes 29
61 Problem Formulation 29
62 Optimality Equations 31
63 Value Iteration 31
64 The Policy Iteration Algorithm 31
65 Modified Policy Iteration 32
66 Average Cost-to-go Problems 33
XI
67 Linear Programming 3468 Efficiency of the Algorithms 3569 Semi-Markov Decision Process 35
7 Approximate Methods for Markov Decision Process - Reinforcement Learning 3771 Introduction 3772 Direct Learning 3873 Indirect Learning 4174 Supervised Learning 42
8 Review of Models for Maintenance Optimization 4381 Finite Horizon Dynamic Programming 4382 Infinite Horizon Stochastic Models 4483 Reinforcement Learning 4584 Conclusions 45
9 A Proposed Finite Horizon Replacement Model 4791 One-Component Model 4792 Multi-Component model 5593 Possible Extensions 59
10 Conclusions and Future Work 61
A Solution of the Shortest Path Example 63
Reference List 65
Chapter 1
Introduction
11 Background
The market and competition laws are introduced among power system companiesdue to the restructuration and deregulation of modern power system The gen-erating companies as well as transmission and distribution system operators aimto minimize their costs Maintenance costs can be a significant part of the totalcosts The pressure to reduce the maintenance budget leads to a need for efficientmaintenance
Maintenance cost be divided into Corrective Maintenance (CM) and PreventiveMaintenance (PM) (see Chapter 21)
CM means that an asset is maintained once an unscheduled functionnal failureoccurs CM can imply high costs for unsupplied energy interruption possible de-terioration of the system human risks or environment consequences etc
PM is employed to reduce the risk of unexpected failure Time Based Maintenance(TBM) is used for the most critical components and Condition Based Maintenance(CBM) for the components that are worth and not too expensive to monitoreThese maintenance actions have a cost for unsupplied energy inspection repairreplacement etc
An efficient maintenance should balance the corrective and preventive maintenanceto minimize the total costs of maintenance
The probability of a functionnal failure for a component is stochastic The probabil-ity depends on the state of component resulting from the history of the component(age intensity of use external stress (such as weather) maintenance actions human
1
errors and construction errors) Stochastic Dynamic Programming (SDP) modelsare optimization models that integrate explicitely stochastic behaviors This featuremakes the models interesting and was the starting idea of this work
12 Objective
The main objective of this work is to investigate the use of stochastic dynamicprogramming models for maintenance optimization and identify possible future ap-plications in power systems
13 Approach
The first task was to understand the different dynamic programming approachesA first distinction was made between finite horizon and infinite horizon approaches
The different techniques that can be used for solving a model based on dynamicprogramming was investigated For infinite horizon models approximate dynamicprogramming was studied These types of methods are related to the field of rein-forcement learning
Some SDP models found in the literature was reviewed Conclusions was madeabout the applicability of each approach for maintenance optimization problemsMoreover future avenue for research was identified
A finite horizon replacement model was developed to illustrate the possible use ofSDP for power system maintenance
14 Outline
Chapter 2 solves an overview of the maintenance field The most important methodsand some optimization models are reviewed
Chapter 3 discusses shortly power systems Some costs and constraints for opti-mization models are proposed
Chapter 4-7 focus on different Dynamic Programming (DP) approaches and al-gorithms to solve them The assumption of the models and practical limitationsare discussed The basic of DP models is investigated in deterministic models inChapter 4 Chapter 5 and 6 focus on Stochastic Dynamic Programming methods
2
respectively for finite and infinite horizons Chapter 7 is an introduction to Approx-imate Dynamic Programming (ADP) also known as Reinforcement Learning (RL)which is an approach to solving Dynamic Programming infinite horizon problemsusing approximate methods
Chapter 8 gives a review of some maintenance optimization models based on dy-namic programming Conclusions are made about possible use of the differentapproaches in maintenance optimization
Chapter 9 is an example of how finite horizon dynamic programming can be usedfor maintenance optimization
Chapter 10 summarizes the conlusions of the work and discuss possible avenues forresearch
3
Chapter 2
Maintenance
The context of maintenance optimization is shortly described in this chapter Differ-ent types of maintenance are defined in Section 21 Some maintenance optimizationmodels are reviewed in Section 22
21 Types of Maintenance
Maintenance is a combination of all technical administrative and managerial actionsduring the life cycle of an item intended to retain it or restore it to a state in whichit can perform the required functions [1] Figure 21 shows a general picture of thedifferent types of maintenance
Corrective Maintenance (CM) is carried out after fault recognition and intendedto put an item into a state in which it can perform a required function [1] It istypically performed in case there is no way or it is not worth detecting or preventinga failure
Preventive maintenance aims at undertaking maintenance actions on a componentbefore it fails to eg avoid high cost of replacement power delivery unsuppliedand possible damages of the surrounding of the component One can distinguishbetween two kind of preventive maintenance
1 Time Based Maintenance (TBM) is preventive maintenance carried out inaccordance with established intervals of time or number of units of use butwithout previous condition investigation [1] TBM is used for failures that areage-related and for which the probability of failure on time can be established
5
Maintenance
Preventive Maintenance
Time-Based Maintenance (TBM) Condition Based Maintenance (CBM)
Continuous Schedulled Inspection Based
Corrective Maintenance
Figure 21 Maintenance Tree based on [1]
2 Condition Based Maintenance is preventive maintenance based on perfor-mance andor parameter monitoring and the subsequent actions [1] PMcorresponds to all the maintenance methods using diagnostic or inspectionsto decide of the maintenance actions Diagnostic methods include the use ofhuman senses (noise visual etc) measurements or tests They can be un-dertaken continuously or during schedulled or requested inspections CBM isoften used for non-age related failures
22 Maintenance Optimization Models
Unexpected failures of a component in a system can lead to expensive CorrectiveMaintenance Preventive Maintenance approaches can be used to avoid CM Ifpreventive maintenance is done too frequently it can however also result in a veryhigh cost
The aim of the maintenance optimization could be to balance corrective and pre-ventive maintenance to minimize for example the total cost of maintenance
Numerous maintenance optimization models have been proposed in the litteratureand interesting reviews have been published Wang [43] gives an interesting pictureof maintenance policy optimization and its influence factors Cho et al [15]Dekker et al [16] and Nicolai et al [31] focus mainly on multi-componentproblems
In this section the most common classes of models are described and some referencesare given This short review is based on Chapter 8 of [4]
6
221 Age Replacement Policies
Under an age replacement policy a component is replace at failure or at the end ofa specified interval whichever occurs first [17] This policy makes sens if preventivereplacement is less expensive than a corrective replacement and the failure rateincrease with time Barlow et al [7] describes a basic age replacement model
A model including discount have been proposed in [17] In this model the loss valueof a replaced component decreases with its age
A model with minimal repair is discussed in [6] If the component fails it can berepaired to the same condition as before the failure occured
An ageblock replacement model with failures resulting from shocks is described in[38] The shocks follows a non-homogeneous Poisson distribution (Poisson processwith a rate that is not stationnary) Two types of failures can result from the shocksminor failure removed by minor repair and major failure removed by replacement
222 Block Replacement Policies
In blocks replacement policies the components of a system are replaced at failureor at fixed times kT (k = 1 2 ) whichever occurs first Barlow et al [7] describesa basic block replacement model To avoid that a component that has just beenreplaced is replaced again a modified block replacement model is proposed in [10]A component is not replaced at a schedulled replacement time if its age is less thanT
This model has been modified in [11] to model that the operational cost of an unitis higher when it becomes older Moreover the model of [10] is extended in [5] toallow multi-component systems with any discrete lifetime distribution
223 Condition Based Maintenance
CBM is being introduced in many systems to avoid unnecessary maintenance andprevent incipient failure In wind turbines condition monitoring is being intro-duced for components like the gear box blades etc [32] One problem prior to theoptimization is to identify relevant variables and identify their relation with failuresmodes and probabilities CBM optimization models focus on different questionsrelated to inspectedmonitored components
One question is the optimal limits for the monitored variables above which it is nec-essary to perform maintenance The optimal wear-limit for preventive replacement
7
of a component is derived in [34] The model is extended in [35] to include differentmonitoring variables
For components subject to inspection at each decision epoch one must decide ifmaintenance should be performed and when the next inspection should occur In[2] the inspection occur at fixed time and the decision of preventive replacementof the component depend on its condition at inspection In [9] a Semi-MarkovDecision Process (SMDP see Chapter 4) is proposed to optimize at each inspectionthe maintenance decision and the time to next inspection
An age replacement policies model that takes into account the information fromcondition based monitoring devices is proposed in [25] A proportional hazardmodel is used to model the effect of the monitored variables The assumption ofa hazard model is that the hazard function is the product of a two functions onedepending on the time and one on the parameters (monitored variables)
224 Opportunistic Maintenance Models
Opportunistics maintenance considers unexpected opportunities of performing pre-ventive maintenance With the failure of a component it is possible to perform PMon other components This could be interesting for offshore wind farms for exampleThe deplacement to the wind farm by boat or helicopter is necessary and can bevery expensive By grouping maintenance actions money could be saved
Haurie et al [19] focus on group preventive replacement policy of m identicalcomponents that are in the same condition Both discrete and continuous time areconsidered and a dynamic programming equation is derived The model is extendedin [26] for m non-identical components
A rolling horizon dynamic programming algorithm is proposed in [45] to take intoaccount the short term information The model can be used for many maintenanceoptimization models
225 Other Types of Models and Criteria of Classifications
Other models integrate the possibility of a limited number of spare parts or a possi-ble choice between different spare part Eg cannibalization models allows the re-useof some components or subcomponents of a system
Other criterias can be used to classify maintenance optimization models The num-ber of components in consideration is important eg multi-components modelsare more interesting in power system The time horizon considered in the model
8
is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used
The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions
9
Chapter 3
Introduction to the Power
System
This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed
31 Power System Presentation
Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents
311 Power System Description
A simple description of the power system include the following main parts
1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated
2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units
11
3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)
4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage
The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition
The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs
312 Maintenance in Power System
The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined
Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])
Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more
12
attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)
The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems
32 Costs
Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows
bull Manpower cost Cost for the maintenance team that performs maintenanceactions
bull Spare part cost The cost of a new component is an important part of themaintenance cost
bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine
bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency
bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure
bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)
33 Main Constraints
Possibles constraints for the maintenance of power system have been identified asfollows
13
bull Manpower The size and availability of the maintenance staff is limited
bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available
bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms
bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time
bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model
bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model
bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model
14
Chapter 4
Introduction to Dynamic
Programming
This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels
41 Introduction
Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system
The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions
In the following important ideas concerning Dynamic Programming are discussed
411 Principle of Optimality
Dynamic programming is a way of decomposing a large problem into subproblems
It can be applied to any problem that observes the principle of optimality
15
An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]
The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions
Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed
412 Deterministic and Stochastic Models
A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made
If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic
Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting
413 Time Horizon
The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons
Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered
Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long
16
414 Decision Time
In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result
Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals
Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here
415 Exact and Approximation Methods
Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)
Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods
Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]
17
42 Deterministic Dynamic Programming
This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem
421 Problem Formulation
The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages
State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)
Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )
Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system
Jlowast0 (X0) = minUk
Nminus1sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
18
422 The Optimality Equation and Value Iteration Algorithm
The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)
Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i
The value iteration algorithm is a direct consequence of the optimality equation
JlowastN (i) = CN (i) foralli isin XN
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
Ulowastk (i) = argminuisinΩU
k(i)
Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
u Decision variableUlowastk (i) Optimal decision action at stage k for state i
lll
The algorithm goes backwards starting from the last stage It stops when k=0
19
423 A Simple Shortest Path Problem Example
Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space
An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered
B E H
A C F I K
D G J
Stage 0 Stage 1 Stage 2 Stage 3 Stage 4
2
4
3
4
62
1
35
2
2
57
3
21
2
4
2
7
The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost
Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K
4231 Problem Formulation
The problem is divided into five stagesn=5 k=01234
State SpaceThe state space is defined for each stage
ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2
ΩX3 = H I J = 0 1 2ΩX4 = K = 0
20
Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector
Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used
ΩUk (i) =
0 1 for i = 00 1 2 for i = 11 2 for i = 2
for k=123
ΩU0 (0) = 0 1 2 for k=0
For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F
Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G
A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro
lowast1 micro
lowastN
Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u
The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states
Objective Function
Jlowast0 (0) = minUkisinΩU
k(Xk)
4sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1
4232 Solution
The value iteration algorithm is used to solve the problem
The algorithm is initiated from the last stage and then iterated backwards until
21
the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited
The solution of the algorithm are given in Appendix A
The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)
22
Chapter 5
Finite Horizon Models
In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended
51 Problem Formulation
Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system
A stochastic dynamic programming model can be formulated as below
State Space
A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable
The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk
Decision Space
At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on
23
the stage u isin ΩUk (i)
Dynamic of the System and Transition Probability
On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)
Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1
The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage
Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)
If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified
P (j u i) = P (Xk+1 = j | Xk = i Uk = u)
In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)
Cost Function
A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage
Ck(j u i) = Ck(xk+1 = j uk = u xk = i)
If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)
A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate
Objective Function
The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem
Jlowast(X0) = minUkisinΩU
k(Xk)ECN (XN ) +
Nminus1sumk=0Ck(Xk+1 Uk Xk)
Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1
24
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
52 Optimality Equation
The optimality equation for stochastic finite horizon DP is
Jlowastk (i) = minuisinΩU
k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)
This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions
Jlowastk (i) = minuisinΩU
k(i)
sum
jisinΩXk+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)
ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function
53 Value Iteration Method
The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system
JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)
While k ge 0 doJlowastk (i) = min
uisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk
Ulowastk (i) = argminuisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN
k larr k minus 1
25
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
Acknowledgements
First of all I would like to thank my supervisors who each in their way supportedme in this work Ass Prof Lina Bertling for her encouragements constructiveremarks and for giving me the opportunity of working on this project Dr ErikDotzauer for many valuable inputs discussions and comments and Prof MichaelPatriksson for his help on mathematical writing
Special greetings to all my friends and companions of study all over the world
Finally my heart turns to my parents and my love for their endless encouragementsand support in my studies and life
Stockholm June 2007
V
Abreviations
ADP Approximate Dynamic ProgrammingCBM Condition Based MaintenanceCM Corrective MaintenanceDP Dynamic ProgrammingIHSDP Infinite Horizon Stochastic Dynamic ProgrammingLP Linear ProgrammingMDP Markov Decision ProcessPI Policy IterationPM Preventive MaintenanceRCAM Reliability Centered Asset MaintenanceRCM Reliability Centered MaintenanceSDP Stochastic Dynamic ProgrammingSMDP Semi-Markov Decision ProcessTBM Time Based MaintenanceVI Value Iteration
VII
Notations
NumbersM Number of iteration for the evaluation step of modified policy iterationN Number of stages
Constantα Discount factor ll
Variablesi State at the current stagej State at the next stagek Stagem Number of iteration left for the evaluation step of modified policy iterationq Iteration number for the policy iteration algorithmu Decision variable
State and Control Spacemicrok Function mapping the states with a decisionmicrolowastk(i) Optimal decision at state k for state imicro Decision policy for stationary systemsmicrolowast Optimal decision policy for stationary systemsπ Policyπlowast Optimal policyUk Decision action at stage kUlowastk (i) Optimal decision action at stage k for state iXk State at stage k
Dynamic and Cost functionsCk(i u) Cost functionCk(i u j) Cost functionCij(u) = C(i u j) Cost function if the system is stationaryCN (i) Terminal cost for state ifk(i u) Dynamic functionfk(i u ω) Stochastic dynamic functionJlowastk (i) Optimal cost-to-go from stage k to N starting from state iωk(i u) Probabilistic function of a disturbances Pk(j u i) Transition probability functionP (j u i) Transition probability function for stationary systemsV (Xk) Cost-to-go resulting of a trajectory starting from state Xk
Sets
IX
ΩUk (i) Decision Space at stage k for state iΩXk State space at stage k
Contents
Contents XI
1 Introduction 1
11 Background 1
12 Objective 2
13 Approach 2
14 Outline 2
2 Maintenance 5
21 Types of Maintenance 5
22 Maintenance Optimization Models 6
3 Introduction to the Power System 11
31 Power System Presentation 11
32 Costs 13
33 Main Constraints 13
4 Introduction to Dynamic Programming 15
41 Introduction 15
42 Deterministic Dynamic Programming 18
5 Finite Horizon Models 23
51 Problem Formulation 23
52 Optimality Equation 25
53 Value Iteration Method 25
54 The Curse of Dimensionality 26
55 Ideas for a Maintenance Optimization Model 26
6 Infinite Horizon Models - Markov Decision Processes 29
61 Problem Formulation 29
62 Optimality Equations 31
63 Value Iteration 31
64 The Policy Iteration Algorithm 31
65 Modified Policy Iteration 32
66 Average Cost-to-go Problems 33
XI
67 Linear Programming 3468 Efficiency of the Algorithms 3569 Semi-Markov Decision Process 35
7 Approximate Methods for Markov Decision Process - Reinforcement Learning 3771 Introduction 3772 Direct Learning 3873 Indirect Learning 4174 Supervised Learning 42
8 Review of Models for Maintenance Optimization 4381 Finite Horizon Dynamic Programming 4382 Infinite Horizon Stochastic Models 4483 Reinforcement Learning 4584 Conclusions 45
9 A Proposed Finite Horizon Replacement Model 4791 One-Component Model 4792 Multi-Component model 5593 Possible Extensions 59
10 Conclusions and Future Work 61
A Solution of the Shortest Path Example 63
Reference List 65
Chapter 1
Introduction
11 Background
The market and competition laws are introduced among power system companiesdue to the restructuration and deregulation of modern power system The gen-erating companies as well as transmission and distribution system operators aimto minimize their costs Maintenance costs can be a significant part of the totalcosts The pressure to reduce the maintenance budget leads to a need for efficientmaintenance
Maintenance cost be divided into Corrective Maintenance (CM) and PreventiveMaintenance (PM) (see Chapter 21)
CM means that an asset is maintained once an unscheduled functionnal failureoccurs CM can imply high costs for unsupplied energy interruption possible de-terioration of the system human risks or environment consequences etc
PM is employed to reduce the risk of unexpected failure Time Based Maintenance(TBM) is used for the most critical components and Condition Based Maintenance(CBM) for the components that are worth and not too expensive to monitoreThese maintenance actions have a cost for unsupplied energy inspection repairreplacement etc
An efficient maintenance should balance the corrective and preventive maintenanceto minimize the total costs of maintenance
The probability of a functionnal failure for a component is stochastic The probabil-ity depends on the state of component resulting from the history of the component(age intensity of use external stress (such as weather) maintenance actions human
1
errors and construction errors) Stochastic Dynamic Programming (SDP) modelsare optimization models that integrate explicitely stochastic behaviors This featuremakes the models interesting and was the starting idea of this work
12 Objective
The main objective of this work is to investigate the use of stochastic dynamicprogramming models for maintenance optimization and identify possible future ap-plications in power systems
13 Approach
The first task was to understand the different dynamic programming approachesA first distinction was made between finite horizon and infinite horizon approaches
The different techniques that can be used for solving a model based on dynamicprogramming was investigated For infinite horizon models approximate dynamicprogramming was studied These types of methods are related to the field of rein-forcement learning
Some SDP models found in the literature was reviewed Conclusions was madeabout the applicability of each approach for maintenance optimization problemsMoreover future avenue for research was identified
A finite horizon replacement model was developed to illustrate the possible use ofSDP for power system maintenance
14 Outline
Chapter 2 solves an overview of the maintenance field The most important methodsand some optimization models are reviewed
Chapter 3 discusses shortly power systems Some costs and constraints for opti-mization models are proposed
Chapter 4-7 focus on different Dynamic Programming (DP) approaches and al-gorithms to solve them The assumption of the models and practical limitationsare discussed The basic of DP models is investigated in deterministic models inChapter 4 Chapter 5 and 6 focus on Stochastic Dynamic Programming methods
2
respectively for finite and infinite horizons Chapter 7 is an introduction to Approx-imate Dynamic Programming (ADP) also known as Reinforcement Learning (RL)which is an approach to solving Dynamic Programming infinite horizon problemsusing approximate methods
Chapter 8 gives a review of some maintenance optimization models based on dy-namic programming Conclusions are made about possible use of the differentapproaches in maintenance optimization
Chapter 9 is an example of how finite horizon dynamic programming can be usedfor maintenance optimization
Chapter 10 summarizes the conlusions of the work and discuss possible avenues forresearch
3
Chapter 2
Maintenance
The context of maintenance optimization is shortly described in this chapter Differ-ent types of maintenance are defined in Section 21 Some maintenance optimizationmodels are reviewed in Section 22
21 Types of Maintenance
Maintenance is a combination of all technical administrative and managerial actionsduring the life cycle of an item intended to retain it or restore it to a state in whichit can perform the required functions [1] Figure 21 shows a general picture of thedifferent types of maintenance
Corrective Maintenance (CM) is carried out after fault recognition and intendedto put an item into a state in which it can perform a required function [1] It istypically performed in case there is no way or it is not worth detecting or preventinga failure
Preventive maintenance aims at undertaking maintenance actions on a componentbefore it fails to eg avoid high cost of replacement power delivery unsuppliedand possible damages of the surrounding of the component One can distinguishbetween two kind of preventive maintenance
1 Time Based Maintenance (TBM) is preventive maintenance carried out inaccordance with established intervals of time or number of units of use butwithout previous condition investigation [1] TBM is used for failures that areage-related and for which the probability of failure on time can be established
5
Maintenance
Preventive Maintenance
Time-Based Maintenance (TBM) Condition Based Maintenance (CBM)
Continuous Schedulled Inspection Based
Corrective Maintenance
Figure 21 Maintenance Tree based on [1]
2 Condition Based Maintenance is preventive maintenance based on perfor-mance andor parameter monitoring and the subsequent actions [1] PMcorresponds to all the maintenance methods using diagnostic or inspectionsto decide of the maintenance actions Diagnostic methods include the use ofhuman senses (noise visual etc) measurements or tests They can be un-dertaken continuously or during schedulled or requested inspections CBM isoften used for non-age related failures
22 Maintenance Optimization Models
Unexpected failures of a component in a system can lead to expensive CorrectiveMaintenance Preventive Maintenance approaches can be used to avoid CM Ifpreventive maintenance is done too frequently it can however also result in a veryhigh cost
The aim of the maintenance optimization could be to balance corrective and pre-ventive maintenance to minimize for example the total cost of maintenance
Numerous maintenance optimization models have been proposed in the litteratureand interesting reviews have been published Wang [43] gives an interesting pictureof maintenance policy optimization and its influence factors Cho et al [15]Dekker et al [16] and Nicolai et al [31] focus mainly on multi-componentproblems
In this section the most common classes of models are described and some referencesare given This short review is based on Chapter 8 of [4]
6
221 Age Replacement Policies
Under an age replacement policy a component is replace at failure or at the end ofa specified interval whichever occurs first [17] This policy makes sens if preventivereplacement is less expensive than a corrective replacement and the failure rateincrease with time Barlow et al [7] describes a basic age replacement model
A model including discount have been proposed in [17] In this model the loss valueof a replaced component decreases with its age
A model with minimal repair is discussed in [6] If the component fails it can berepaired to the same condition as before the failure occured
An ageblock replacement model with failures resulting from shocks is described in[38] The shocks follows a non-homogeneous Poisson distribution (Poisson processwith a rate that is not stationnary) Two types of failures can result from the shocksminor failure removed by minor repair and major failure removed by replacement
222 Block Replacement Policies
In blocks replacement policies the components of a system are replaced at failureor at fixed times kT (k = 1 2 ) whichever occurs first Barlow et al [7] describesa basic block replacement model To avoid that a component that has just beenreplaced is replaced again a modified block replacement model is proposed in [10]A component is not replaced at a schedulled replacement time if its age is less thanT
This model has been modified in [11] to model that the operational cost of an unitis higher when it becomes older Moreover the model of [10] is extended in [5] toallow multi-component systems with any discrete lifetime distribution
223 Condition Based Maintenance
CBM is being introduced in many systems to avoid unnecessary maintenance andprevent incipient failure In wind turbines condition monitoring is being intro-duced for components like the gear box blades etc [32] One problem prior to theoptimization is to identify relevant variables and identify their relation with failuresmodes and probabilities CBM optimization models focus on different questionsrelated to inspectedmonitored components
One question is the optimal limits for the monitored variables above which it is nec-essary to perform maintenance The optimal wear-limit for preventive replacement
7
of a component is derived in [34] The model is extended in [35] to include differentmonitoring variables
For components subject to inspection at each decision epoch one must decide ifmaintenance should be performed and when the next inspection should occur In[2] the inspection occur at fixed time and the decision of preventive replacementof the component depend on its condition at inspection In [9] a Semi-MarkovDecision Process (SMDP see Chapter 4) is proposed to optimize at each inspectionthe maintenance decision and the time to next inspection
An age replacement policies model that takes into account the information fromcondition based monitoring devices is proposed in [25] A proportional hazardmodel is used to model the effect of the monitored variables The assumption ofa hazard model is that the hazard function is the product of a two functions onedepending on the time and one on the parameters (monitored variables)
224 Opportunistic Maintenance Models
Opportunistics maintenance considers unexpected opportunities of performing pre-ventive maintenance With the failure of a component it is possible to perform PMon other components This could be interesting for offshore wind farms for exampleThe deplacement to the wind farm by boat or helicopter is necessary and can bevery expensive By grouping maintenance actions money could be saved
Haurie et al [19] focus on group preventive replacement policy of m identicalcomponents that are in the same condition Both discrete and continuous time areconsidered and a dynamic programming equation is derived The model is extendedin [26] for m non-identical components
A rolling horizon dynamic programming algorithm is proposed in [45] to take intoaccount the short term information The model can be used for many maintenanceoptimization models
225 Other Types of Models and Criteria of Classifications
Other models integrate the possibility of a limited number of spare parts or a possi-ble choice between different spare part Eg cannibalization models allows the re-useof some components or subcomponents of a system
Other criterias can be used to classify maintenance optimization models The num-ber of components in consideration is important eg multi-components modelsare more interesting in power system The time horizon considered in the model
8
is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used
The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions
9
Chapter 3
Introduction to the Power
System
This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed
31 Power System Presentation
Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents
311 Power System Description
A simple description of the power system include the following main parts
1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated
2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units
11
3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)
4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage
The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition
The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs
312 Maintenance in Power System
The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined
Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])
Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more
12
attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)
The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems
32 Costs
Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows
bull Manpower cost Cost for the maintenance team that performs maintenanceactions
bull Spare part cost The cost of a new component is an important part of themaintenance cost
bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine
bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency
bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure
bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)
33 Main Constraints
Possibles constraints for the maintenance of power system have been identified asfollows
13
bull Manpower The size and availability of the maintenance staff is limited
bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available
bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms
bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time
bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model
bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model
bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model
14
Chapter 4
Introduction to Dynamic
Programming
This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels
41 Introduction
Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system
The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions
In the following important ideas concerning Dynamic Programming are discussed
411 Principle of Optimality
Dynamic programming is a way of decomposing a large problem into subproblems
It can be applied to any problem that observes the principle of optimality
15
An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]
The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions
Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed
412 Deterministic and Stochastic Models
A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made
If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic
Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting
413 Time Horizon
The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons
Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered
Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long
16
414 Decision Time
In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result
Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals
Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here
415 Exact and Approximation Methods
Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)
Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods
Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]
17
42 Deterministic Dynamic Programming
This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem
421 Problem Formulation
The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages
State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)
Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )
Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system
Jlowast0 (X0) = minUk
Nminus1sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
18
422 The Optimality Equation and Value Iteration Algorithm
The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)
Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i
The value iteration algorithm is a direct consequence of the optimality equation
JlowastN (i) = CN (i) foralli isin XN
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
Ulowastk (i) = argminuisinΩU
k(i)
Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
u Decision variableUlowastk (i) Optimal decision action at stage k for state i
lll
The algorithm goes backwards starting from the last stage It stops when k=0
19
423 A Simple Shortest Path Problem Example
Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space
An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered
B E H
A C F I K
D G J
Stage 0 Stage 1 Stage 2 Stage 3 Stage 4
2
4
3
4
62
1
35
2
2
57
3
21
2
4
2
7
The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost
Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K
4231 Problem Formulation
The problem is divided into five stagesn=5 k=01234
State SpaceThe state space is defined for each stage
ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2
ΩX3 = H I J = 0 1 2ΩX4 = K = 0
20
Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector
Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used
ΩUk (i) =
0 1 for i = 00 1 2 for i = 11 2 for i = 2
for k=123
ΩU0 (0) = 0 1 2 for k=0
For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F
Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G
A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro
lowast1 micro
lowastN
Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u
The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states
Objective Function
Jlowast0 (0) = minUkisinΩU
k(Xk)
4sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1
4232 Solution
The value iteration algorithm is used to solve the problem
The algorithm is initiated from the last stage and then iterated backwards until
21
the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited
The solution of the algorithm are given in Appendix A
The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)
22
Chapter 5
Finite Horizon Models
In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended
51 Problem Formulation
Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system
A stochastic dynamic programming model can be formulated as below
State Space
A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable
The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk
Decision Space
At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on
23
the stage u isin ΩUk (i)
Dynamic of the System and Transition Probability
On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)
Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1
The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage
Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)
If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified
P (j u i) = P (Xk+1 = j | Xk = i Uk = u)
In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)
Cost Function
A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage
Ck(j u i) = Ck(xk+1 = j uk = u xk = i)
If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)
A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate
Objective Function
The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem
Jlowast(X0) = minUkisinΩU
k(Xk)ECN (XN ) +
Nminus1sumk=0Ck(Xk+1 Uk Xk)
Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1
24
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
52 Optimality Equation
The optimality equation for stochastic finite horizon DP is
Jlowastk (i) = minuisinΩU
k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)
This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions
Jlowastk (i) = minuisinΩU
k(i)
sum
jisinΩXk+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)
ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function
53 Value Iteration Method
The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system
JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)
While k ge 0 doJlowastk (i) = min
uisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk
Ulowastk (i) = argminuisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN
k larr k minus 1
25
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
Abreviations
ADP Approximate Dynamic ProgrammingCBM Condition Based MaintenanceCM Corrective MaintenanceDP Dynamic ProgrammingIHSDP Infinite Horizon Stochastic Dynamic ProgrammingLP Linear ProgrammingMDP Markov Decision ProcessPI Policy IterationPM Preventive MaintenanceRCAM Reliability Centered Asset MaintenanceRCM Reliability Centered MaintenanceSDP Stochastic Dynamic ProgrammingSMDP Semi-Markov Decision ProcessTBM Time Based MaintenanceVI Value Iteration
VII
Notations
NumbersM Number of iteration for the evaluation step of modified policy iterationN Number of stages
Constantα Discount factor ll
Variablesi State at the current stagej State at the next stagek Stagem Number of iteration left for the evaluation step of modified policy iterationq Iteration number for the policy iteration algorithmu Decision variable
State and Control Spacemicrok Function mapping the states with a decisionmicrolowastk(i) Optimal decision at state k for state imicro Decision policy for stationary systemsmicrolowast Optimal decision policy for stationary systemsπ Policyπlowast Optimal policyUk Decision action at stage kUlowastk (i) Optimal decision action at stage k for state iXk State at stage k
Dynamic and Cost functionsCk(i u) Cost functionCk(i u j) Cost functionCij(u) = C(i u j) Cost function if the system is stationaryCN (i) Terminal cost for state ifk(i u) Dynamic functionfk(i u ω) Stochastic dynamic functionJlowastk (i) Optimal cost-to-go from stage k to N starting from state iωk(i u) Probabilistic function of a disturbances Pk(j u i) Transition probability functionP (j u i) Transition probability function for stationary systemsV (Xk) Cost-to-go resulting of a trajectory starting from state Xk
Sets
IX
ΩUk (i) Decision Space at stage k for state iΩXk State space at stage k
Contents
Contents XI
1 Introduction 1
11 Background 1
12 Objective 2
13 Approach 2
14 Outline 2
2 Maintenance 5
21 Types of Maintenance 5
22 Maintenance Optimization Models 6
3 Introduction to the Power System 11
31 Power System Presentation 11
32 Costs 13
33 Main Constraints 13
4 Introduction to Dynamic Programming 15
41 Introduction 15
42 Deterministic Dynamic Programming 18
5 Finite Horizon Models 23
51 Problem Formulation 23
52 Optimality Equation 25
53 Value Iteration Method 25
54 The Curse of Dimensionality 26
55 Ideas for a Maintenance Optimization Model 26
6 Infinite Horizon Models - Markov Decision Processes 29
61 Problem Formulation 29
62 Optimality Equations 31
63 Value Iteration 31
64 The Policy Iteration Algorithm 31
65 Modified Policy Iteration 32
66 Average Cost-to-go Problems 33
XI
67 Linear Programming 3468 Efficiency of the Algorithms 3569 Semi-Markov Decision Process 35
7 Approximate Methods for Markov Decision Process - Reinforcement Learning 3771 Introduction 3772 Direct Learning 3873 Indirect Learning 4174 Supervised Learning 42
8 Review of Models for Maintenance Optimization 4381 Finite Horizon Dynamic Programming 4382 Infinite Horizon Stochastic Models 4483 Reinforcement Learning 4584 Conclusions 45
9 A Proposed Finite Horizon Replacement Model 4791 One-Component Model 4792 Multi-Component model 5593 Possible Extensions 59
10 Conclusions and Future Work 61
A Solution of the Shortest Path Example 63
Reference List 65
Chapter 1
Introduction
11 Background
The market and competition laws are introduced among power system companiesdue to the restructuration and deregulation of modern power system The gen-erating companies as well as transmission and distribution system operators aimto minimize their costs Maintenance costs can be a significant part of the totalcosts The pressure to reduce the maintenance budget leads to a need for efficientmaintenance
Maintenance cost be divided into Corrective Maintenance (CM) and PreventiveMaintenance (PM) (see Chapter 21)
CM means that an asset is maintained once an unscheduled functionnal failureoccurs CM can imply high costs for unsupplied energy interruption possible de-terioration of the system human risks or environment consequences etc
PM is employed to reduce the risk of unexpected failure Time Based Maintenance(TBM) is used for the most critical components and Condition Based Maintenance(CBM) for the components that are worth and not too expensive to monitoreThese maintenance actions have a cost for unsupplied energy inspection repairreplacement etc
An efficient maintenance should balance the corrective and preventive maintenanceto minimize the total costs of maintenance
The probability of a functionnal failure for a component is stochastic The probabil-ity depends on the state of component resulting from the history of the component(age intensity of use external stress (such as weather) maintenance actions human
1
errors and construction errors) Stochastic Dynamic Programming (SDP) modelsare optimization models that integrate explicitely stochastic behaviors This featuremakes the models interesting and was the starting idea of this work
12 Objective
The main objective of this work is to investigate the use of stochastic dynamicprogramming models for maintenance optimization and identify possible future ap-plications in power systems
13 Approach
The first task was to understand the different dynamic programming approachesA first distinction was made between finite horizon and infinite horizon approaches
The different techniques that can be used for solving a model based on dynamicprogramming was investigated For infinite horizon models approximate dynamicprogramming was studied These types of methods are related to the field of rein-forcement learning
Some SDP models found in the literature was reviewed Conclusions was madeabout the applicability of each approach for maintenance optimization problemsMoreover future avenue for research was identified
A finite horizon replacement model was developed to illustrate the possible use ofSDP for power system maintenance
14 Outline
Chapter 2 solves an overview of the maintenance field The most important methodsand some optimization models are reviewed
Chapter 3 discusses shortly power systems Some costs and constraints for opti-mization models are proposed
Chapter 4-7 focus on different Dynamic Programming (DP) approaches and al-gorithms to solve them The assumption of the models and practical limitationsare discussed The basic of DP models is investigated in deterministic models inChapter 4 Chapter 5 and 6 focus on Stochastic Dynamic Programming methods
2
respectively for finite and infinite horizons Chapter 7 is an introduction to Approx-imate Dynamic Programming (ADP) also known as Reinforcement Learning (RL)which is an approach to solving Dynamic Programming infinite horizon problemsusing approximate methods
Chapter 8 gives a review of some maintenance optimization models based on dy-namic programming Conclusions are made about possible use of the differentapproaches in maintenance optimization
Chapter 9 is an example of how finite horizon dynamic programming can be usedfor maintenance optimization
Chapter 10 summarizes the conlusions of the work and discuss possible avenues forresearch
3
Chapter 2
Maintenance
The context of maintenance optimization is shortly described in this chapter Differ-ent types of maintenance are defined in Section 21 Some maintenance optimizationmodels are reviewed in Section 22
21 Types of Maintenance
Maintenance is a combination of all technical administrative and managerial actionsduring the life cycle of an item intended to retain it or restore it to a state in whichit can perform the required functions [1] Figure 21 shows a general picture of thedifferent types of maintenance
Corrective Maintenance (CM) is carried out after fault recognition and intendedto put an item into a state in which it can perform a required function [1] It istypically performed in case there is no way or it is not worth detecting or preventinga failure
Preventive maintenance aims at undertaking maintenance actions on a componentbefore it fails to eg avoid high cost of replacement power delivery unsuppliedand possible damages of the surrounding of the component One can distinguishbetween two kind of preventive maintenance
1 Time Based Maintenance (TBM) is preventive maintenance carried out inaccordance with established intervals of time or number of units of use butwithout previous condition investigation [1] TBM is used for failures that areage-related and for which the probability of failure on time can be established
5
Maintenance
Preventive Maintenance
Time-Based Maintenance (TBM) Condition Based Maintenance (CBM)
Continuous Schedulled Inspection Based
Corrective Maintenance
Figure 21 Maintenance Tree based on [1]
2 Condition Based Maintenance is preventive maintenance based on perfor-mance andor parameter monitoring and the subsequent actions [1] PMcorresponds to all the maintenance methods using diagnostic or inspectionsto decide of the maintenance actions Diagnostic methods include the use ofhuman senses (noise visual etc) measurements or tests They can be un-dertaken continuously or during schedulled or requested inspections CBM isoften used for non-age related failures
22 Maintenance Optimization Models
Unexpected failures of a component in a system can lead to expensive CorrectiveMaintenance Preventive Maintenance approaches can be used to avoid CM Ifpreventive maintenance is done too frequently it can however also result in a veryhigh cost
The aim of the maintenance optimization could be to balance corrective and pre-ventive maintenance to minimize for example the total cost of maintenance
Numerous maintenance optimization models have been proposed in the litteratureand interesting reviews have been published Wang [43] gives an interesting pictureof maintenance policy optimization and its influence factors Cho et al [15]Dekker et al [16] and Nicolai et al [31] focus mainly on multi-componentproblems
In this section the most common classes of models are described and some referencesare given This short review is based on Chapter 8 of [4]
6
221 Age Replacement Policies
Under an age replacement policy a component is replace at failure or at the end ofa specified interval whichever occurs first [17] This policy makes sens if preventivereplacement is less expensive than a corrective replacement and the failure rateincrease with time Barlow et al [7] describes a basic age replacement model
A model including discount have been proposed in [17] In this model the loss valueof a replaced component decreases with its age
A model with minimal repair is discussed in [6] If the component fails it can berepaired to the same condition as before the failure occured
An ageblock replacement model with failures resulting from shocks is described in[38] The shocks follows a non-homogeneous Poisson distribution (Poisson processwith a rate that is not stationnary) Two types of failures can result from the shocksminor failure removed by minor repair and major failure removed by replacement
222 Block Replacement Policies
In blocks replacement policies the components of a system are replaced at failureor at fixed times kT (k = 1 2 ) whichever occurs first Barlow et al [7] describesa basic block replacement model To avoid that a component that has just beenreplaced is replaced again a modified block replacement model is proposed in [10]A component is not replaced at a schedulled replacement time if its age is less thanT
This model has been modified in [11] to model that the operational cost of an unitis higher when it becomes older Moreover the model of [10] is extended in [5] toallow multi-component systems with any discrete lifetime distribution
223 Condition Based Maintenance
CBM is being introduced in many systems to avoid unnecessary maintenance andprevent incipient failure In wind turbines condition monitoring is being intro-duced for components like the gear box blades etc [32] One problem prior to theoptimization is to identify relevant variables and identify their relation with failuresmodes and probabilities CBM optimization models focus on different questionsrelated to inspectedmonitored components
One question is the optimal limits for the monitored variables above which it is nec-essary to perform maintenance The optimal wear-limit for preventive replacement
7
of a component is derived in [34] The model is extended in [35] to include differentmonitoring variables
For components subject to inspection at each decision epoch one must decide ifmaintenance should be performed and when the next inspection should occur In[2] the inspection occur at fixed time and the decision of preventive replacementof the component depend on its condition at inspection In [9] a Semi-MarkovDecision Process (SMDP see Chapter 4) is proposed to optimize at each inspectionthe maintenance decision and the time to next inspection
An age replacement policies model that takes into account the information fromcondition based monitoring devices is proposed in [25] A proportional hazardmodel is used to model the effect of the monitored variables The assumption ofa hazard model is that the hazard function is the product of a two functions onedepending on the time and one on the parameters (monitored variables)
224 Opportunistic Maintenance Models
Opportunistics maintenance considers unexpected opportunities of performing pre-ventive maintenance With the failure of a component it is possible to perform PMon other components This could be interesting for offshore wind farms for exampleThe deplacement to the wind farm by boat or helicopter is necessary and can bevery expensive By grouping maintenance actions money could be saved
Haurie et al [19] focus on group preventive replacement policy of m identicalcomponents that are in the same condition Both discrete and continuous time areconsidered and a dynamic programming equation is derived The model is extendedin [26] for m non-identical components
A rolling horizon dynamic programming algorithm is proposed in [45] to take intoaccount the short term information The model can be used for many maintenanceoptimization models
225 Other Types of Models and Criteria of Classifications
Other models integrate the possibility of a limited number of spare parts or a possi-ble choice between different spare part Eg cannibalization models allows the re-useof some components or subcomponents of a system
Other criterias can be used to classify maintenance optimization models The num-ber of components in consideration is important eg multi-components modelsare more interesting in power system The time horizon considered in the model
8
is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used
The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions
9
Chapter 3
Introduction to the Power
System
This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed
31 Power System Presentation
Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents
311 Power System Description
A simple description of the power system include the following main parts
1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated
2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units
11
3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)
4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage
The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition
The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs
312 Maintenance in Power System
The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined
Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])
Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more
12
attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)
The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems
32 Costs
Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows
bull Manpower cost Cost for the maintenance team that performs maintenanceactions
bull Spare part cost The cost of a new component is an important part of themaintenance cost
bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine
bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency
bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure
bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)
33 Main Constraints
Possibles constraints for the maintenance of power system have been identified asfollows
13
bull Manpower The size and availability of the maintenance staff is limited
bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available
bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms
bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time
bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model
bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model
bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model
14
Chapter 4
Introduction to Dynamic
Programming
This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels
41 Introduction
Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system
The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions
In the following important ideas concerning Dynamic Programming are discussed
411 Principle of Optimality
Dynamic programming is a way of decomposing a large problem into subproblems
It can be applied to any problem that observes the principle of optimality
15
An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]
The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions
Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed
412 Deterministic and Stochastic Models
A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made
If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic
Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting
413 Time Horizon
The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons
Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered
Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long
16
414 Decision Time
In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result
Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals
Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here
415 Exact and Approximation Methods
Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)
Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods
Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]
17
42 Deterministic Dynamic Programming
This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem
421 Problem Formulation
The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages
State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)
Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )
Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system
Jlowast0 (X0) = minUk
Nminus1sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
18
422 The Optimality Equation and Value Iteration Algorithm
The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)
Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i
The value iteration algorithm is a direct consequence of the optimality equation
JlowastN (i) = CN (i) foralli isin XN
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
Ulowastk (i) = argminuisinΩU
k(i)
Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
u Decision variableUlowastk (i) Optimal decision action at stage k for state i
lll
The algorithm goes backwards starting from the last stage It stops when k=0
19
423 A Simple Shortest Path Problem Example
Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space
An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered
B E H
A C F I K
D G J
Stage 0 Stage 1 Stage 2 Stage 3 Stage 4
2
4
3
4
62
1
35
2
2
57
3
21
2
4
2
7
The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost
Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K
4231 Problem Formulation
The problem is divided into five stagesn=5 k=01234
State SpaceThe state space is defined for each stage
ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2
ΩX3 = H I J = 0 1 2ΩX4 = K = 0
20
Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector
Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used
ΩUk (i) =
0 1 for i = 00 1 2 for i = 11 2 for i = 2
for k=123
ΩU0 (0) = 0 1 2 for k=0
For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F
Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G
A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro
lowast1 micro
lowastN
Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u
The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states
Objective Function
Jlowast0 (0) = minUkisinΩU
k(Xk)
4sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1
4232 Solution
The value iteration algorithm is used to solve the problem
The algorithm is initiated from the last stage and then iterated backwards until
21
the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited
The solution of the algorithm are given in Appendix A
The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)
22
Chapter 5
Finite Horizon Models
In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended
51 Problem Formulation
Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system
A stochastic dynamic programming model can be formulated as below
State Space
A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable
The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk
Decision Space
At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on
23
the stage u isin ΩUk (i)
Dynamic of the System and Transition Probability
On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)
Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1
The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage
Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)
If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified
P (j u i) = P (Xk+1 = j | Xk = i Uk = u)
In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)
Cost Function
A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage
Ck(j u i) = Ck(xk+1 = j uk = u xk = i)
If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)
A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate
Objective Function
The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem
Jlowast(X0) = minUkisinΩU
k(Xk)ECN (XN ) +
Nminus1sumk=0Ck(Xk+1 Uk Xk)
Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1
24
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
52 Optimality Equation
The optimality equation for stochastic finite horizon DP is
Jlowastk (i) = minuisinΩU
k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)
This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions
Jlowastk (i) = minuisinΩU
k(i)
sum
jisinΩXk+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)
ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function
53 Value Iteration Method
The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system
JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)
While k ge 0 doJlowastk (i) = min
uisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk
Ulowastk (i) = argminuisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN
k larr k minus 1
25
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
Notations
NumbersM Number of iteration for the evaluation step of modified policy iterationN Number of stages
Constantα Discount factor ll
Variablesi State at the current stagej State at the next stagek Stagem Number of iteration left for the evaluation step of modified policy iterationq Iteration number for the policy iteration algorithmu Decision variable
State and Control Spacemicrok Function mapping the states with a decisionmicrolowastk(i) Optimal decision at state k for state imicro Decision policy for stationary systemsmicrolowast Optimal decision policy for stationary systemsπ Policyπlowast Optimal policyUk Decision action at stage kUlowastk (i) Optimal decision action at stage k for state iXk State at stage k
Dynamic and Cost functionsCk(i u) Cost functionCk(i u j) Cost functionCij(u) = C(i u j) Cost function if the system is stationaryCN (i) Terminal cost for state ifk(i u) Dynamic functionfk(i u ω) Stochastic dynamic functionJlowastk (i) Optimal cost-to-go from stage k to N starting from state iωk(i u) Probabilistic function of a disturbances Pk(j u i) Transition probability functionP (j u i) Transition probability function for stationary systemsV (Xk) Cost-to-go resulting of a trajectory starting from state Xk
Sets
IX
ΩUk (i) Decision Space at stage k for state iΩXk State space at stage k
Contents
Contents XI
1 Introduction 1
11 Background 1
12 Objective 2
13 Approach 2
14 Outline 2
2 Maintenance 5
21 Types of Maintenance 5
22 Maintenance Optimization Models 6
3 Introduction to the Power System 11
31 Power System Presentation 11
32 Costs 13
33 Main Constraints 13
4 Introduction to Dynamic Programming 15
41 Introduction 15
42 Deterministic Dynamic Programming 18
5 Finite Horizon Models 23
51 Problem Formulation 23
52 Optimality Equation 25
53 Value Iteration Method 25
54 The Curse of Dimensionality 26
55 Ideas for a Maintenance Optimization Model 26
6 Infinite Horizon Models - Markov Decision Processes 29
61 Problem Formulation 29
62 Optimality Equations 31
63 Value Iteration 31
64 The Policy Iteration Algorithm 31
65 Modified Policy Iteration 32
66 Average Cost-to-go Problems 33
XI
67 Linear Programming 3468 Efficiency of the Algorithms 3569 Semi-Markov Decision Process 35
7 Approximate Methods for Markov Decision Process - Reinforcement Learning 3771 Introduction 3772 Direct Learning 3873 Indirect Learning 4174 Supervised Learning 42
8 Review of Models for Maintenance Optimization 4381 Finite Horizon Dynamic Programming 4382 Infinite Horizon Stochastic Models 4483 Reinforcement Learning 4584 Conclusions 45
9 A Proposed Finite Horizon Replacement Model 4791 One-Component Model 4792 Multi-Component model 5593 Possible Extensions 59
10 Conclusions and Future Work 61
A Solution of the Shortest Path Example 63
Reference List 65
Chapter 1
Introduction
11 Background
The market and competition laws are introduced among power system companiesdue to the restructuration and deregulation of modern power system The gen-erating companies as well as transmission and distribution system operators aimto minimize their costs Maintenance costs can be a significant part of the totalcosts The pressure to reduce the maintenance budget leads to a need for efficientmaintenance
Maintenance cost be divided into Corrective Maintenance (CM) and PreventiveMaintenance (PM) (see Chapter 21)
CM means that an asset is maintained once an unscheduled functionnal failureoccurs CM can imply high costs for unsupplied energy interruption possible de-terioration of the system human risks or environment consequences etc
PM is employed to reduce the risk of unexpected failure Time Based Maintenance(TBM) is used for the most critical components and Condition Based Maintenance(CBM) for the components that are worth and not too expensive to monitoreThese maintenance actions have a cost for unsupplied energy inspection repairreplacement etc
An efficient maintenance should balance the corrective and preventive maintenanceto minimize the total costs of maintenance
The probability of a functionnal failure for a component is stochastic The probabil-ity depends on the state of component resulting from the history of the component(age intensity of use external stress (such as weather) maintenance actions human
1
errors and construction errors) Stochastic Dynamic Programming (SDP) modelsare optimization models that integrate explicitely stochastic behaviors This featuremakes the models interesting and was the starting idea of this work
12 Objective
The main objective of this work is to investigate the use of stochastic dynamicprogramming models for maintenance optimization and identify possible future ap-plications in power systems
13 Approach
The first task was to understand the different dynamic programming approachesA first distinction was made between finite horizon and infinite horizon approaches
The different techniques that can be used for solving a model based on dynamicprogramming was investigated For infinite horizon models approximate dynamicprogramming was studied These types of methods are related to the field of rein-forcement learning
Some SDP models found in the literature was reviewed Conclusions was madeabout the applicability of each approach for maintenance optimization problemsMoreover future avenue for research was identified
A finite horizon replacement model was developed to illustrate the possible use ofSDP for power system maintenance
14 Outline
Chapter 2 solves an overview of the maintenance field The most important methodsand some optimization models are reviewed
Chapter 3 discusses shortly power systems Some costs and constraints for opti-mization models are proposed
Chapter 4-7 focus on different Dynamic Programming (DP) approaches and al-gorithms to solve them The assumption of the models and practical limitationsare discussed The basic of DP models is investigated in deterministic models inChapter 4 Chapter 5 and 6 focus on Stochastic Dynamic Programming methods
2
respectively for finite and infinite horizons Chapter 7 is an introduction to Approx-imate Dynamic Programming (ADP) also known as Reinforcement Learning (RL)which is an approach to solving Dynamic Programming infinite horizon problemsusing approximate methods
Chapter 8 gives a review of some maintenance optimization models based on dy-namic programming Conclusions are made about possible use of the differentapproaches in maintenance optimization
Chapter 9 is an example of how finite horizon dynamic programming can be usedfor maintenance optimization
Chapter 10 summarizes the conlusions of the work and discuss possible avenues forresearch
3
Chapter 2
Maintenance
The context of maintenance optimization is shortly described in this chapter Differ-ent types of maintenance are defined in Section 21 Some maintenance optimizationmodels are reviewed in Section 22
21 Types of Maintenance
Maintenance is a combination of all technical administrative and managerial actionsduring the life cycle of an item intended to retain it or restore it to a state in whichit can perform the required functions [1] Figure 21 shows a general picture of thedifferent types of maintenance
Corrective Maintenance (CM) is carried out after fault recognition and intendedto put an item into a state in which it can perform a required function [1] It istypically performed in case there is no way or it is not worth detecting or preventinga failure
Preventive maintenance aims at undertaking maintenance actions on a componentbefore it fails to eg avoid high cost of replacement power delivery unsuppliedand possible damages of the surrounding of the component One can distinguishbetween two kind of preventive maintenance
1 Time Based Maintenance (TBM) is preventive maintenance carried out inaccordance with established intervals of time or number of units of use butwithout previous condition investigation [1] TBM is used for failures that areage-related and for which the probability of failure on time can be established
5
Maintenance
Preventive Maintenance
Time-Based Maintenance (TBM) Condition Based Maintenance (CBM)
Continuous Schedulled Inspection Based
Corrective Maintenance
Figure 21 Maintenance Tree based on [1]
2 Condition Based Maintenance is preventive maintenance based on perfor-mance andor parameter monitoring and the subsequent actions [1] PMcorresponds to all the maintenance methods using diagnostic or inspectionsto decide of the maintenance actions Diagnostic methods include the use ofhuman senses (noise visual etc) measurements or tests They can be un-dertaken continuously or during schedulled or requested inspections CBM isoften used for non-age related failures
22 Maintenance Optimization Models
Unexpected failures of a component in a system can lead to expensive CorrectiveMaintenance Preventive Maintenance approaches can be used to avoid CM Ifpreventive maintenance is done too frequently it can however also result in a veryhigh cost
The aim of the maintenance optimization could be to balance corrective and pre-ventive maintenance to minimize for example the total cost of maintenance
Numerous maintenance optimization models have been proposed in the litteratureand interesting reviews have been published Wang [43] gives an interesting pictureof maintenance policy optimization and its influence factors Cho et al [15]Dekker et al [16] and Nicolai et al [31] focus mainly on multi-componentproblems
In this section the most common classes of models are described and some referencesare given This short review is based on Chapter 8 of [4]
6
221 Age Replacement Policies
Under an age replacement policy a component is replace at failure or at the end ofa specified interval whichever occurs first [17] This policy makes sens if preventivereplacement is less expensive than a corrective replacement and the failure rateincrease with time Barlow et al [7] describes a basic age replacement model
A model including discount have been proposed in [17] In this model the loss valueof a replaced component decreases with its age
A model with minimal repair is discussed in [6] If the component fails it can berepaired to the same condition as before the failure occured
An ageblock replacement model with failures resulting from shocks is described in[38] The shocks follows a non-homogeneous Poisson distribution (Poisson processwith a rate that is not stationnary) Two types of failures can result from the shocksminor failure removed by minor repair and major failure removed by replacement
222 Block Replacement Policies
In blocks replacement policies the components of a system are replaced at failureor at fixed times kT (k = 1 2 ) whichever occurs first Barlow et al [7] describesa basic block replacement model To avoid that a component that has just beenreplaced is replaced again a modified block replacement model is proposed in [10]A component is not replaced at a schedulled replacement time if its age is less thanT
This model has been modified in [11] to model that the operational cost of an unitis higher when it becomes older Moreover the model of [10] is extended in [5] toallow multi-component systems with any discrete lifetime distribution
223 Condition Based Maintenance
CBM is being introduced in many systems to avoid unnecessary maintenance andprevent incipient failure In wind turbines condition monitoring is being intro-duced for components like the gear box blades etc [32] One problem prior to theoptimization is to identify relevant variables and identify their relation with failuresmodes and probabilities CBM optimization models focus on different questionsrelated to inspectedmonitored components
One question is the optimal limits for the monitored variables above which it is nec-essary to perform maintenance The optimal wear-limit for preventive replacement
7
of a component is derived in [34] The model is extended in [35] to include differentmonitoring variables
For components subject to inspection at each decision epoch one must decide ifmaintenance should be performed and when the next inspection should occur In[2] the inspection occur at fixed time and the decision of preventive replacementof the component depend on its condition at inspection In [9] a Semi-MarkovDecision Process (SMDP see Chapter 4) is proposed to optimize at each inspectionthe maintenance decision and the time to next inspection
An age replacement policies model that takes into account the information fromcondition based monitoring devices is proposed in [25] A proportional hazardmodel is used to model the effect of the monitored variables The assumption ofa hazard model is that the hazard function is the product of a two functions onedepending on the time and one on the parameters (monitored variables)
224 Opportunistic Maintenance Models
Opportunistics maintenance considers unexpected opportunities of performing pre-ventive maintenance With the failure of a component it is possible to perform PMon other components This could be interesting for offshore wind farms for exampleThe deplacement to the wind farm by boat or helicopter is necessary and can bevery expensive By grouping maintenance actions money could be saved
Haurie et al [19] focus on group preventive replacement policy of m identicalcomponents that are in the same condition Both discrete and continuous time areconsidered and a dynamic programming equation is derived The model is extendedin [26] for m non-identical components
A rolling horizon dynamic programming algorithm is proposed in [45] to take intoaccount the short term information The model can be used for many maintenanceoptimization models
225 Other Types of Models and Criteria of Classifications
Other models integrate the possibility of a limited number of spare parts or a possi-ble choice between different spare part Eg cannibalization models allows the re-useof some components or subcomponents of a system
Other criterias can be used to classify maintenance optimization models The num-ber of components in consideration is important eg multi-components modelsare more interesting in power system The time horizon considered in the model
8
is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used
The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions
9
Chapter 3
Introduction to the Power
System
This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed
31 Power System Presentation
Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents
311 Power System Description
A simple description of the power system include the following main parts
1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated
2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units
11
3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)
4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage
The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition
The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs
312 Maintenance in Power System
The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined
Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])
Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more
12
attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)
The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems
32 Costs
Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows
bull Manpower cost Cost for the maintenance team that performs maintenanceactions
bull Spare part cost The cost of a new component is an important part of themaintenance cost
bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine
bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency
bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure
bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)
33 Main Constraints
Possibles constraints for the maintenance of power system have been identified asfollows
13
bull Manpower The size and availability of the maintenance staff is limited
bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available
bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms
bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time
bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model
bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model
bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model
14
Chapter 4
Introduction to Dynamic
Programming
This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels
41 Introduction
Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system
The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions
In the following important ideas concerning Dynamic Programming are discussed
411 Principle of Optimality
Dynamic programming is a way of decomposing a large problem into subproblems
It can be applied to any problem that observes the principle of optimality
15
An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]
The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions
Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed
412 Deterministic and Stochastic Models
A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made
If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic
Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting
413 Time Horizon
The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons
Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered
Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long
16
414 Decision Time
In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result
Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals
Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here
415 Exact and Approximation Methods
Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)
Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods
Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]
17
42 Deterministic Dynamic Programming
This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem
421 Problem Formulation
The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages
State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)
Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )
Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system
Jlowast0 (X0) = minUk
Nminus1sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
18
422 The Optimality Equation and Value Iteration Algorithm
The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)
Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i
The value iteration algorithm is a direct consequence of the optimality equation
JlowastN (i) = CN (i) foralli isin XN
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
Ulowastk (i) = argminuisinΩU
k(i)
Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
u Decision variableUlowastk (i) Optimal decision action at stage k for state i
lll
The algorithm goes backwards starting from the last stage It stops when k=0
19
423 A Simple Shortest Path Problem Example
Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space
An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered
B E H
A C F I K
D G J
Stage 0 Stage 1 Stage 2 Stage 3 Stage 4
2
4
3
4
62
1
35
2
2
57
3
21
2
4
2
7
The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost
Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K
4231 Problem Formulation
The problem is divided into five stagesn=5 k=01234
State SpaceThe state space is defined for each stage
ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2
ΩX3 = H I J = 0 1 2ΩX4 = K = 0
20
Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector
Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used
ΩUk (i) =
0 1 for i = 00 1 2 for i = 11 2 for i = 2
for k=123
ΩU0 (0) = 0 1 2 for k=0
For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F
Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G
A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro
lowast1 micro
lowastN
Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u
The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states
Objective Function
Jlowast0 (0) = minUkisinΩU
k(Xk)
4sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1
4232 Solution
The value iteration algorithm is used to solve the problem
The algorithm is initiated from the last stage and then iterated backwards until
21
the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited
The solution of the algorithm are given in Appendix A
The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)
22
Chapter 5
Finite Horizon Models
In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended
51 Problem Formulation
Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system
A stochastic dynamic programming model can be formulated as below
State Space
A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable
The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk
Decision Space
At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on
23
the stage u isin ΩUk (i)
Dynamic of the System and Transition Probability
On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)
Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1
The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage
Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)
If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified
P (j u i) = P (Xk+1 = j | Xk = i Uk = u)
In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)
Cost Function
A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage
Ck(j u i) = Ck(xk+1 = j uk = u xk = i)
If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)
A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate
Objective Function
The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem
Jlowast(X0) = minUkisinΩU
k(Xk)ECN (XN ) +
Nminus1sumk=0Ck(Xk+1 Uk Xk)
Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1
24
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
52 Optimality Equation
The optimality equation for stochastic finite horizon DP is
Jlowastk (i) = minuisinΩU
k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)
This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions
Jlowastk (i) = minuisinΩU
k(i)
sum
jisinΩXk+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)
ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function
53 Value Iteration Method
The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system
JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)
While k ge 0 doJlowastk (i) = min
uisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk
Ulowastk (i) = argminuisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN
k larr k minus 1
25
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
ΩUk (i) Decision Space at stage k for state iΩXk State space at stage k
Contents
Contents XI
1 Introduction 1
11 Background 1
12 Objective 2
13 Approach 2
14 Outline 2
2 Maintenance 5
21 Types of Maintenance 5
22 Maintenance Optimization Models 6
3 Introduction to the Power System 11
31 Power System Presentation 11
32 Costs 13
33 Main Constraints 13
4 Introduction to Dynamic Programming 15
41 Introduction 15
42 Deterministic Dynamic Programming 18
5 Finite Horizon Models 23
51 Problem Formulation 23
52 Optimality Equation 25
53 Value Iteration Method 25
54 The Curse of Dimensionality 26
55 Ideas for a Maintenance Optimization Model 26
6 Infinite Horizon Models - Markov Decision Processes 29
61 Problem Formulation 29
62 Optimality Equations 31
63 Value Iteration 31
64 The Policy Iteration Algorithm 31
65 Modified Policy Iteration 32
66 Average Cost-to-go Problems 33
XI
67 Linear Programming 3468 Efficiency of the Algorithms 3569 Semi-Markov Decision Process 35
7 Approximate Methods for Markov Decision Process - Reinforcement Learning 3771 Introduction 3772 Direct Learning 3873 Indirect Learning 4174 Supervised Learning 42
8 Review of Models for Maintenance Optimization 4381 Finite Horizon Dynamic Programming 4382 Infinite Horizon Stochastic Models 4483 Reinforcement Learning 4584 Conclusions 45
9 A Proposed Finite Horizon Replacement Model 4791 One-Component Model 4792 Multi-Component model 5593 Possible Extensions 59
10 Conclusions and Future Work 61
A Solution of the Shortest Path Example 63
Reference List 65
Chapter 1
Introduction
11 Background
The market and competition laws are introduced among power system companiesdue to the restructuration and deregulation of modern power system The gen-erating companies as well as transmission and distribution system operators aimto minimize their costs Maintenance costs can be a significant part of the totalcosts The pressure to reduce the maintenance budget leads to a need for efficientmaintenance
Maintenance cost be divided into Corrective Maintenance (CM) and PreventiveMaintenance (PM) (see Chapter 21)
CM means that an asset is maintained once an unscheduled functionnal failureoccurs CM can imply high costs for unsupplied energy interruption possible de-terioration of the system human risks or environment consequences etc
PM is employed to reduce the risk of unexpected failure Time Based Maintenance(TBM) is used for the most critical components and Condition Based Maintenance(CBM) for the components that are worth and not too expensive to monitoreThese maintenance actions have a cost for unsupplied energy inspection repairreplacement etc
An efficient maintenance should balance the corrective and preventive maintenanceto minimize the total costs of maintenance
The probability of a functionnal failure for a component is stochastic The probabil-ity depends on the state of component resulting from the history of the component(age intensity of use external stress (such as weather) maintenance actions human
1
errors and construction errors) Stochastic Dynamic Programming (SDP) modelsare optimization models that integrate explicitely stochastic behaviors This featuremakes the models interesting and was the starting idea of this work
12 Objective
The main objective of this work is to investigate the use of stochastic dynamicprogramming models for maintenance optimization and identify possible future ap-plications in power systems
13 Approach
The first task was to understand the different dynamic programming approachesA first distinction was made between finite horizon and infinite horizon approaches
The different techniques that can be used for solving a model based on dynamicprogramming was investigated For infinite horizon models approximate dynamicprogramming was studied These types of methods are related to the field of rein-forcement learning
Some SDP models found in the literature was reviewed Conclusions was madeabout the applicability of each approach for maintenance optimization problemsMoreover future avenue for research was identified
A finite horizon replacement model was developed to illustrate the possible use ofSDP for power system maintenance
14 Outline
Chapter 2 solves an overview of the maintenance field The most important methodsand some optimization models are reviewed
Chapter 3 discusses shortly power systems Some costs and constraints for opti-mization models are proposed
Chapter 4-7 focus on different Dynamic Programming (DP) approaches and al-gorithms to solve them The assumption of the models and practical limitationsare discussed The basic of DP models is investigated in deterministic models inChapter 4 Chapter 5 and 6 focus on Stochastic Dynamic Programming methods
2
respectively for finite and infinite horizons Chapter 7 is an introduction to Approx-imate Dynamic Programming (ADP) also known as Reinforcement Learning (RL)which is an approach to solving Dynamic Programming infinite horizon problemsusing approximate methods
Chapter 8 gives a review of some maintenance optimization models based on dy-namic programming Conclusions are made about possible use of the differentapproaches in maintenance optimization
Chapter 9 is an example of how finite horizon dynamic programming can be usedfor maintenance optimization
Chapter 10 summarizes the conlusions of the work and discuss possible avenues forresearch
3
Chapter 2
Maintenance
The context of maintenance optimization is shortly described in this chapter Differ-ent types of maintenance are defined in Section 21 Some maintenance optimizationmodels are reviewed in Section 22
21 Types of Maintenance
Maintenance is a combination of all technical administrative and managerial actionsduring the life cycle of an item intended to retain it or restore it to a state in whichit can perform the required functions [1] Figure 21 shows a general picture of thedifferent types of maintenance
Corrective Maintenance (CM) is carried out after fault recognition and intendedto put an item into a state in which it can perform a required function [1] It istypically performed in case there is no way or it is not worth detecting or preventinga failure
Preventive maintenance aims at undertaking maintenance actions on a componentbefore it fails to eg avoid high cost of replacement power delivery unsuppliedand possible damages of the surrounding of the component One can distinguishbetween two kind of preventive maintenance
1 Time Based Maintenance (TBM) is preventive maintenance carried out inaccordance with established intervals of time or number of units of use butwithout previous condition investigation [1] TBM is used for failures that areage-related and for which the probability of failure on time can be established
5
Maintenance
Preventive Maintenance
Time-Based Maintenance (TBM) Condition Based Maintenance (CBM)
Continuous Schedulled Inspection Based
Corrective Maintenance
Figure 21 Maintenance Tree based on [1]
2 Condition Based Maintenance is preventive maintenance based on perfor-mance andor parameter monitoring and the subsequent actions [1] PMcorresponds to all the maintenance methods using diagnostic or inspectionsto decide of the maintenance actions Diagnostic methods include the use ofhuman senses (noise visual etc) measurements or tests They can be un-dertaken continuously or during schedulled or requested inspections CBM isoften used for non-age related failures
22 Maintenance Optimization Models
Unexpected failures of a component in a system can lead to expensive CorrectiveMaintenance Preventive Maintenance approaches can be used to avoid CM Ifpreventive maintenance is done too frequently it can however also result in a veryhigh cost
The aim of the maintenance optimization could be to balance corrective and pre-ventive maintenance to minimize for example the total cost of maintenance
Numerous maintenance optimization models have been proposed in the litteratureand interesting reviews have been published Wang [43] gives an interesting pictureof maintenance policy optimization and its influence factors Cho et al [15]Dekker et al [16] and Nicolai et al [31] focus mainly on multi-componentproblems
In this section the most common classes of models are described and some referencesare given This short review is based on Chapter 8 of [4]
6
221 Age Replacement Policies
Under an age replacement policy a component is replace at failure or at the end ofa specified interval whichever occurs first [17] This policy makes sens if preventivereplacement is less expensive than a corrective replacement and the failure rateincrease with time Barlow et al [7] describes a basic age replacement model
A model including discount have been proposed in [17] In this model the loss valueof a replaced component decreases with its age
A model with minimal repair is discussed in [6] If the component fails it can berepaired to the same condition as before the failure occured
An ageblock replacement model with failures resulting from shocks is described in[38] The shocks follows a non-homogeneous Poisson distribution (Poisson processwith a rate that is not stationnary) Two types of failures can result from the shocksminor failure removed by minor repair and major failure removed by replacement
222 Block Replacement Policies
In blocks replacement policies the components of a system are replaced at failureor at fixed times kT (k = 1 2 ) whichever occurs first Barlow et al [7] describesa basic block replacement model To avoid that a component that has just beenreplaced is replaced again a modified block replacement model is proposed in [10]A component is not replaced at a schedulled replacement time if its age is less thanT
This model has been modified in [11] to model that the operational cost of an unitis higher when it becomes older Moreover the model of [10] is extended in [5] toallow multi-component systems with any discrete lifetime distribution
223 Condition Based Maintenance
CBM is being introduced in many systems to avoid unnecessary maintenance andprevent incipient failure In wind turbines condition monitoring is being intro-duced for components like the gear box blades etc [32] One problem prior to theoptimization is to identify relevant variables and identify their relation with failuresmodes and probabilities CBM optimization models focus on different questionsrelated to inspectedmonitored components
One question is the optimal limits for the monitored variables above which it is nec-essary to perform maintenance The optimal wear-limit for preventive replacement
7
of a component is derived in [34] The model is extended in [35] to include differentmonitoring variables
For components subject to inspection at each decision epoch one must decide ifmaintenance should be performed and when the next inspection should occur In[2] the inspection occur at fixed time and the decision of preventive replacementof the component depend on its condition at inspection In [9] a Semi-MarkovDecision Process (SMDP see Chapter 4) is proposed to optimize at each inspectionthe maintenance decision and the time to next inspection
An age replacement policies model that takes into account the information fromcondition based monitoring devices is proposed in [25] A proportional hazardmodel is used to model the effect of the monitored variables The assumption ofa hazard model is that the hazard function is the product of a two functions onedepending on the time and one on the parameters (monitored variables)
224 Opportunistic Maintenance Models
Opportunistics maintenance considers unexpected opportunities of performing pre-ventive maintenance With the failure of a component it is possible to perform PMon other components This could be interesting for offshore wind farms for exampleThe deplacement to the wind farm by boat or helicopter is necessary and can bevery expensive By grouping maintenance actions money could be saved
Haurie et al [19] focus on group preventive replacement policy of m identicalcomponents that are in the same condition Both discrete and continuous time areconsidered and a dynamic programming equation is derived The model is extendedin [26] for m non-identical components
A rolling horizon dynamic programming algorithm is proposed in [45] to take intoaccount the short term information The model can be used for many maintenanceoptimization models
225 Other Types of Models and Criteria of Classifications
Other models integrate the possibility of a limited number of spare parts or a possi-ble choice between different spare part Eg cannibalization models allows the re-useof some components or subcomponents of a system
Other criterias can be used to classify maintenance optimization models The num-ber of components in consideration is important eg multi-components modelsare more interesting in power system The time horizon considered in the model
8
is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used
The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions
9
Chapter 3
Introduction to the Power
System
This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed
31 Power System Presentation
Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents
311 Power System Description
A simple description of the power system include the following main parts
1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated
2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units
11
3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)
4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage
The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition
The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs
312 Maintenance in Power System
The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined
Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])
Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more
12
attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)
The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems
32 Costs
Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows
bull Manpower cost Cost for the maintenance team that performs maintenanceactions
bull Spare part cost The cost of a new component is an important part of themaintenance cost
bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine
bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency
bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure
bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)
33 Main Constraints
Possibles constraints for the maintenance of power system have been identified asfollows
13
bull Manpower The size and availability of the maintenance staff is limited
bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available
bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms
bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time
bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model
bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model
bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model
14
Chapter 4
Introduction to Dynamic
Programming
This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels
41 Introduction
Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system
The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions
In the following important ideas concerning Dynamic Programming are discussed
411 Principle of Optimality
Dynamic programming is a way of decomposing a large problem into subproblems
It can be applied to any problem that observes the principle of optimality
15
An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]
The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions
Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed
412 Deterministic and Stochastic Models
A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made
If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic
Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting
413 Time Horizon
The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons
Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered
Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long
16
414 Decision Time
In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result
Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals
Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here
415 Exact and Approximation Methods
Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)
Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods
Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]
17
42 Deterministic Dynamic Programming
This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem
421 Problem Formulation
The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages
State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)
Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )
Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system
Jlowast0 (X0) = minUk
Nminus1sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
18
422 The Optimality Equation and Value Iteration Algorithm
The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)
Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i
The value iteration algorithm is a direct consequence of the optimality equation
JlowastN (i) = CN (i) foralli isin XN
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
Ulowastk (i) = argminuisinΩU
k(i)
Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
u Decision variableUlowastk (i) Optimal decision action at stage k for state i
lll
The algorithm goes backwards starting from the last stage It stops when k=0
19
423 A Simple Shortest Path Problem Example
Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space
An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered
B E H
A C F I K
D G J
Stage 0 Stage 1 Stage 2 Stage 3 Stage 4
2
4
3
4
62
1
35
2
2
57
3
21
2
4
2
7
The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost
Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K
4231 Problem Formulation
The problem is divided into five stagesn=5 k=01234
State SpaceThe state space is defined for each stage
ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2
ΩX3 = H I J = 0 1 2ΩX4 = K = 0
20
Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector
Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used
ΩUk (i) =
0 1 for i = 00 1 2 for i = 11 2 for i = 2
for k=123
ΩU0 (0) = 0 1 2 for k=0
For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F
Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G
A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro
lowast1 micro
lowastN
Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u
The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states
Objective Function
Jlowast0 (0) = minUkisinΩU
k(Xk)
4sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1
4232 Solution
The value iteration algorithm is used to solve the problem
The algorithm is initiated from the last stage and then iterated backwards until
21
the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited
The solution of the algorithm are given in Appendix A
The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)
22
Chapter 5
Finite Horizon Models
In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended
51 Problem Formulation
Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system
A stochastic dynamic programming model can be formulated as below
State Space
A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable
The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk
Decision Space
At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on
23
the stage u isin ΩUk (i)
Dynamic of the System and Transition Probability
On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)
Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1
The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage
Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)
If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified
P (j u i) = P (Xk+1 = j | Xk = i Uk = u)
In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)
Cost Function
A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage
Ck(j u i) = Ck(xk+1 = j uk = u xk = i)
If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)
A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate
Objective Function
The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem
Jlowast(X0) = minUkisinΩU
k(Xk)ECN (XN ) +
Nminus1sumk=0Ck(Xk+1 Uk Xk)
Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1
24
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
52 Optimality Equation
The optimality equation for stochastic finite horizon DP is
Jlowastk (i) = minuisinΩU
k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)
This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions
Jlowastk (i) = minuisinΩU
k(i)
sum
jisinΩXk+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)
ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function
53 Value Iteration Method
The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system
JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)
While k ge 0 doJlowastk (i) = min
uisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk
Ulowastk (i) = argminuisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN
k larr k minus 1
25
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
Contents
Contents XI
1 Introduction 1
11 Background 1
12 Objective 2
13 Approach 2
14 Outline 2
2 Maintenance 5
21 Types of Maintenance 5
22 Maintenance Optimization Models 6
3 Introduction to the Power System 11
31 Power System Presentation 11
32 Costs 13
33 Main Constraints 13
4 Introduction to Dynamic Programming 15
41 Introduction 15
42 Deterministic Dynamic Programming 18
5 Finite Horizon Models 23
51 Problem Formulation 23
52 Optimality Equation 25
53 Value Iteration Method 25
54 The Curse of Dimensionality 26
55 Ideas for a Maintenance Optimization Model 26
6 Infinite Horizon Models - Markov Decision Processes 29
61 Problem Formulation 29
62 Optimality Equations 31
63 Value Iteration 31
64 The Policy Iteration Algorithm 31
65 Modified Policy Iteration 32
66 Average Cost-to-go Problems 33
XI
67 Linear Programming 3468 Efficiency of the Algorithms 3569 Semi-Markov Decision Process 35
7 Approximate Methods for Markov Decision Process - Reinforcement Learning 3771 Introduction 3772 Direct Learning 3873 Indirect Learning 4174 Supervised Learning 42
8 Review of Models for Maintenance Optimization 4381 Finite Horizon Dynamic Programming 4382 Infinite Horizon Stochastic Models 4483 Reinforcement Learning 4584 Conclusions 45
9 A Proposed Finite Horizon Replacement Model 4791 One-Component Model 4792 Multi-Component model 5593 Possible Extensions 59
10 Conclusions and Future Work 61
A Solution of the Shortest Path Example 63
Reference List 65
Chapter 1
Introduction
11 Background
The market and competition laws are introduced among power system companiesdue to the restructuration and deregulation of modern power system The gen-erating companies as well as transmission and distribution system operators aimto minimize their costs Maintenance costs can be a significant part of the totalcosts The pressure to reduce the maintenance budget leads to a need for efficientmaintenance
Maintenance cost be divided into Corrective Maintenance (CM) and PreventiveMaintenance (PM) (see Chapter 21)
CM means that an asset is maintained once an unscheduled functionnal failureoccurs CM can imply high costs for unsupplied energy interruption possible de-terioration of the system human risks or environment consequences etc
PM is employed to reduce the risk of unexpected failure Time Based Maintenance(TBM) is used for the most critical components and Condition Based Maintenance(CBM) for the components that are worth and not too expensive to monitoreThese maintenance actions have a cost for unsupplied energy inspection repairreplacement etc
An efficient maintenance should balance the corrective and preventive maintenanceto minimize the total costs of maintenance
The probability of a functionnal failure for a component is stochastic The probabil-ity depends on the state of component resulting from the history of the component(age intensity of use external stress (such as weather) maintenance actions human
1
errors and construction errors) Stochastic Dynamic Programming (SDP) modelsare optimization models that integrate explicitely stochastic behaviors This featuremakes the models interesting and was the starting idea of this work
12 Objective
The main objective of this work is to investigate the use of stochastic dynamicprogramming models for maintenance optimization and identify possible future ap-plications in power systems
13 Approach
The first task was to understand the different dynamic programming approachesA first distinction was made between finite horizon and infinite horizon approaches
The different techniques that can be used for solving a model based on dynamicprogramming was investigated For infinite horizon models approximate dynamicprogramming was studied These types of methods are related to the field of rein-forcement learning
Some SDP models found in the literature was reviewed Conclusions was madeabout the applicability of each approach for maintenance optimization problemsMoreover future avenue for research was identified
A finite horizon replacement model was developed to illustrate the possible use ofSDP for power system maintenance
14 Outline
Chapter 2 solves an overview of the maintenance field The most important methodsand some optimization models are reviewed
Chapter 3 discusses shortly power systems Some costs and constraints for opti-mization models are proposed
Chapter 4-7 focus on different Dynamic Programming (DP) approaches and al-gorithms to solve them The assumption of the models and practical limitationsare discussed The basic of DP models is investigated in deterministic models inChapter 4 Chapter 5 and 6 focus on Stochastic Dynamic Programming methods
2
respectively for finite and infinite horizons Chapter 7 is an introduction to Approx-imate Dynamic Programming (ADP) also known as Reinforcement Learning (RL)which is an approach to solving Dynamic Programming infinite horizon problemsusing approximate methods
Chapter 8 gives a review of some maintenance optimization models based on dy-namic programming Conclusions are made about possible use of the differentapproaches in maintenance optimization
Chapter 9 is an example of how finite horizon dynamic programming can be usedfor maintenance optimization
Chapter 10 summarizes the conlusions of the work and discuss possible avenues forresearch
3
Chapter 2
Maintenance
The context of maintenance optimization is shortly described in this chapter Differ-ent types of maintenance are defined in Section 21 Some maintenance optimizationmodels are reviewed in Section 22
21 Types of Maintenance
Maintenance is a combination of all technical administrative and managerial actionsduring the life cycle of an item intended to retain it or restore it to a state in whichit can perform the required functions [1] Figure 21 shows a general picture of thedifferent types of maintenance
Corrective Maintenance (CM) is carried out after fault recognition and intendedto put an item into a state in which it can perform a required function [1] It istypically performed in case there is no way or it is not worth detecting or preventinga failure
Preventive maintenance aims at undertaking maintenance actions on a componentbefore it fails to eg avoid high cost of replacement power delivery unsuppliedand possible damages of the surrounding of the component One can distinguishbetween two kind of preventive maintenance
1 Time Based Maintenance (TBM) is preventive maintenance carried out inaccordance with established intervals of time or number of units of use butwithout previous condition investigation [1] TBM is used for failures that areage-related and for which the probability of failure on time can be established
5
Maintenance
Preventive Maintenance
Time-Based Maintenance (TBM) Condition Based Maintenance (CBM)
Continuous Schedulled Inspection Based
Corrective Maintenance
Figure 21 Maintenance Tree based on [1]
2 Condition Based Maintenance is preventive maintenance based on perfor-mance andor parameter monitoring and the subsequent actions [1] PMcorresponds to all the maintenance methods using diagnostic or inspectionsto decide of the maintenance actions Diagnostic methods include the use ofhuman senses (noise visual etc) measurements or tests They can be un-dertaken continuously or during schedulled or requested inspections CBM isoften used for non-age related failures
22 Maintenance Optimization Models
Unexpected failures of a component in a system can lead to expensive CorrectiveMaintenance Preventive Maintenance approaches can be used to avoid CM Ifpreventive maintenance is done too frequently it can however also result in a veryhigh cost
The aim of the maintenance optimization could be to balance corrective and pre-ventive maintenance to minimize for example the total cost of maintenance
Numerous maintenance optimization models have been proposed in the litteratureand interesting reviews have been published Wang [43] gives an interesting pictureof maintenance policy optimization and its influence factors Cho et al [15]Dekker et al [16] and Nicolai et al [31] focus mainly on multi-componentproblems
In this section the most common classes of models are described and some referencesare given This short review is based on Chapter 8 of [4]
6
221 Age Replacement Policies
Under an age replacement policy a component is replace at failure or at the end ofa specified interval whichever occurs first [17] This policy makes sens if preventivereplacement is less expensive than a corrective replacement and the failure rateincrease with time Barlow et al [7] describes a basic age replacement model
A model including discount have been proposed in [17] In this model the loss valueof a replaced component decreases with its age
A model with minimal repair is discussed in [6] If the component fails it can berepaired to the same condition as before the failure occured
An ageblock replacement model with failures resulting from shocks is described in[38] The shocks follows a non-homogeneous Poisson distribution (Poisson processwith a rate that is not stationnary) Two types of failures can result from the shocksminor failure removed by minor repair and major failure removed by replacement
222 Block Replacement Policies
In blocks replacement policies the components of a system are replaced at failureor at fixed times kT (k = 1 2 ) whichever occurs first Barlow et al [7] describesa basic block replacement model To avoid that a component that has just beenreplaced is replaced again a modified block replacement model is proposed in [10]A component is not replaced at a schedulled replacement time if its age is less thanT
This model has been modified in [11] to model that the operational cost of an unitis higher when it becomes older Moreover the model of [10] is extended in [5] toallow multi-component systems with any discrete lifetime distribution
223 Condition Based Maintenance
CBM is being introduced in many systems to avoid unnecessary maintenance andprevent incipient failure In wind turbines condition monitoring is being intro-duced for components like the gear box blades etc [32] One problem prior to theoptimization is to identify relevant variables and identify their relation with failuresmodes and probabilities CBM optimization models focus on different questionsrelated to inspectedmonitored components
One question is the optimal limits for the monitored variables above which it is nec-essary to perform maintenance The optimal wear-limit for preventive replacement
7
of a component is derived in [34] The model is extended in [35] to include differentmonitoring variables
For components subject to inspection at each decision epoch one must decide ifmaintenance should be performed and when the next inspection should occur In[2] the inspection occur at fixed time and the decision of preventive replacementof the component depend on its condition at inspection In [9] a Semi-MarkovDecision Process (SMDP see Chapter 4) is proposed to optimize at each inspectionthe maintenance decision and the time to next inspection
An age replacement policies model that takes into account the information fromcondition based monitoring devices is proposed in [25] A proportional hazardmodel is used to model the effect of the monitored variables The assumption ofa hazard model is that the hazard function is the product of a two functions onedepending on the time and one on the parameters (monitored variables)
224 Opportunistic Maintenance Models
Opportunistics maintenance considers unexpected opportunities of performing pre-ventive maintenance With the failure of a component it is possible to perform PMon other components This could be interesting for offshore wind farms for exampleThe deplacement to the wind farm by boat or helicopter is necessary and can bevery expensive By grouping maintenance actions money could be saved
Haurie et al [19] focus on group preventive replacement policy of m identicalcomponents that are in the same condition Both discrete and continuous time areconsidered and a dynamic programming equation is derived The model is extendedin [26] for m non-identical components
A rolling horizon dynamic programming algorithm is proposed in [45] to take intoaccount the short term information The model can be used for many maintenanceoptimization models
225 Other Types of Models and Criteria of Classifications
Other models integrate the possibility of a limited number of spare parts or a possi-ble choice between different spare part Eg cannibalization models allows the re-useof some components or subcomponents of a system
Other criterias can be used to classify maintenance optimization models The num-ber of components in consideration is important eg multi-components modelsare more interesting in power system The time horizon considered in the model
8
is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used
The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions
9
Chapter 3
Introduction to the Power
System
This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed
31 Power System Presentation
Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents
311 Power System Description
A simple description of the power system include the following main parts
1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated
2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units
11
3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)
4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage
The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition
The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs
312 Maintenance in Power System
The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined
Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])
Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more
12
attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)
The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems
32 Costs
Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows
bull Manpower cost Cost for the maintenance team that performs maintenanceactions
bull Spare part cost The cost of a new component is an important part of themaintenance cost
bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine
bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency
bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure
bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)
33 Main Constraints
Possibles constraints for the maintenance of power system have been identified asfollows
13
bull Manpower The size and availability of the maintenance staff is limited
bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available
bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms
bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time
bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model
bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model
bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model
14
Chapter 4
Introduction to Dynamic
Programming
This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels
41 Introduction
Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system
The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions
In the following important ideas concerning Dynamic Programming are discussed
411 Principle of Optimality
Dynamic programming is a way of decomposing a large problem into subproblems
It can be applied to any problem that observes the principle of optimality
15
An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]
The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions
Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed
412 Deterministic and Stochastic Models
A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made
If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic
Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting
413 Time Horizon
The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons
Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered
Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long
16
414 Decision Time
In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result
Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals
Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here
415 Exact and Approximation Methods
Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)
Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods
Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]
17
42 Deterministic Dynamic Programming
This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem
421 Problem Formulation
The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages
State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)
Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )
Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system
Jlowast0 (X0) = minUk
Nminus1sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
18
422 The Optimality Equation and Value Iteration Algorithm
The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)
Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i
The value iteration algorithm is a direct consequence of the optimality equation
JlowastN (i) = CN (i) foralli isin XN
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
Ulowastk (i) = argminuisinΩU
k(i)
Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
u Decision variableUlowastk (i) Optimal decision action at stage k for state i
lll
The algorithm goes backwards starting from the last stage It stops when k=0
19
423 A Simple Shortest Path Problem Example
Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space
An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered
B E H
A C F I K
D G J
Stage 0 Stage 1 Stage 2 Stage 3 Stage 4
2
4
3
4
62
1
35
2
2
57
3
21
2
4
2
7
The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost
Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K
4231 Problem Formulation
The problem is divided into five stagesn=5 k=01234
State SpaceThe state space is defined for each stage
ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2
ΩX3 = H I J = 0 1 2ΩX4 = K = 0
20
Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector
Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used
ΩUk (i) =
0 1 for i = 00 1 2 for i = 11 2 for i = 2
for k=123
ΩU0 (0) = 0 1 2 for k=0
For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F
Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G
A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro
lowast1 micro
lowastN
Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u
The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states
Objective Function
Jlowast0 (0) = minUkisinΩU
k(Xk)
4sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1
4232 Solution
The value iteration algorithm is used to solve the problem
The algorithm is initiated from the last stage and then iterated backwards until
21
the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited
The solution of the algorithm are given in Appendix A
The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)
22
Chapter 5
Finite Horizon Models
In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended
51 Problem Formulation
Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system
A stochastic dynamic programming model can be formulated as below
State Space
A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable
The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk
Decision Space
At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on
23
the stage u isin ΩUk (i)
Dynamic of the System and Transition Probability
On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)
Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1
The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage
Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)
If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified
P (j u i) = P (Xk+1 = j | Xk = i Uk = u)
In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)
Cost Function
A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage
Ck(j u i) = Ck(xk+1 = j uk = u xk = i)
If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)
A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate
Objective Function
The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem
Jlowast(X0) = minUkisinΩU
k(Xk)ECN (XN ) +
Nminus1sumk=0Ck(Xk+1 Uk Xk)
Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1
24
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
52 Optimality Equation
The optimality equation for stochastic finite horizon DP is
Jlowastk (i) = minuisinΩU
k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)
This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions
Jlowastk (i) = minuisinΩU
k(i)
sum
jisinΩXk+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)
ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function
53 Value Iteration Method
The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system
JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)
While k ge 0 doJlowastk (i) = min
uisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk
Ulowastk (i) = argminuisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN
k larr k minus 1
25
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
67 Linear Programming 3468 Efficiency of the Algorithms 3569 Semi-Markov Decision Process 35
7 Approximate Methods for Markov Decision Process - Reinforcement Learning 3771 Introduction 3772 Direct Learning 3873 Indirect Learning 4174 Supervised Learning 42
8 Review of Models for Maintenance Optimization 4381 Finite Horizon Dynamic Programming 4382 Infinite Horizon Stochastic Models 4483 Reinforcement Learning 4584 Conclusions 45
9 A Proposed Finite Horizon Replacement Model 4791 One-Component Model 4792 Multi-Component model 5593 Possible Extensions 59
10 Conclusions and Future Work 61
A Solution of the Shortest Path Example 63
Reference List 65
Chapter 1
Introduction
11 Background
The market and competition laws are introduced among power system companiesdue to the restructuration and deregulation of modern power system The gen-erating companies as well as transmission and distribution system operators aimto minimize their costs Maintenance costs can be a significant part of the totalcosts The pressure to reduce the maintenance budget leads to a need for efficientmaintenance
Maintenance cost be divided into Corrective Maintenance (CM) and PreventiveMaintenance (PM) (see Chapter 21)
CM means that an asset is maintained once an unscheduled functionnal failureoccurs CM can imply high costs for unsupplied energy interruption possible de-terioration of the system human risks or environment consequences etc
PM is employed to reduce the risk of unexpected failure Time Based Maintenance(TBM) is used for the most critical components and Condition Based Maintenance(CBM) for the components that are worth and not too expensive to monitoreThese maintenance actions have a cost for unsupplied energy inspection repairreplacement etc
An efficient maintenance should balance the corrective and preventive maintenanceto minimize the total costs of maintenance
The probability of a functionnal failure for a component is stochastic The probabil-ity depends on the state of component resulting from the history of the component(age intensity of use external stress (such as weather) maintenance actions human
1
errors and construction errors) Stochastic Dynamic Programming (SDP) modelsare optimization models that integrate explicitely stochastic behaviors This featuremakes the models interesting and was the starting idea of this work
12 Objective
The main objective of this work is to investigate the use of stochastic dynamicprogramming models for maintenance optimization and identify possible future ap-plications in power systems
13 Approach
The first task was to understand the different dynamic programming approachesA first distinction was made between finite horizon and infinite horizon approaches
The different techniques that can be used for solving a model based on dynamicprogramming was investigated For infinite horizon models approximate dynamicprogramming was studied These types of methods are related to the field of rein-forcement learning
Some SDP models found in the literature was reviewed Conclusions was madeabout the applicability of each approach for maintenance optimization problemsMoreover future avenue for research was identified
A finite horizon replacement model was developed to illustrate the possible use ofSDP for power system maintenance
14 Outline
Chapter 2 solves an overview of the maintenance field The most important methodsand some optimization models are reviewed
Chapter 3 discusses shortly power systems Some costs and constraints for opti-mization models are proposed
Chapter 4-7 focus on different Dynamic Programming (DP) approaches and al-gorithms to solve them The assumption of the models and practical limitationsare discussed The basic of DP models is investigated in deterministic models inChapter 4 Chapter 5 and 6 focus on Stochastic Dynamic Programming methods
2
respectively for finite and infinite horizons Chapter 7 is an introduction to Approx-imate Dynamic Programming (ADP) also known as Reinforcement Learning (RL)which is an approach to solving Dynamic Programming infinite horizon problemsusing approximate methods
Chapter 8 gives a review of some maintenance optimization models based on dy-namic programming Conclusions are made about possible use of the differentapproaches in maintenance optimization
Chapter 9 is an example of how finite horizon dynamic programming can be usedfor maintenance optimization
Chapter 10 summarizes the conlusions of the work and discuss possible avenues forresearch
3
Chapter 2
Maintenance
The context of maintenance optimization is shortly described in this chapter Differ-ent types of maintenance are defined in Section 21 Some maintenance optimizationmodels are reviewed in Section 22
21 Types of Maintenance
Maintenance is a combination of all technical administrative and managerial actionsduring the life cycle of an item intended to retain it or restore it to a state in whichit can perform the required functions [1] Figure 21 shows a general picture of thedifferent types of maintenance
Corrective Maintenance (CM) is carried out after fault recognition and intendedto put an item into a state in which it can perform a required function [1] It istypically performed in case there is no way or it is not worth detecting or preventinga failure
Preventive maintenance aims at undertaking maintenance actions on a componentbefore it fails to eg avoid high cost of replacement power delivery unsuppliedand possible damages of the surrounding of the component One can distinguishbetween two kind of preventive maintenance
1 Time Based Maintenance (TBM) is preventive maintenance carried out inaccordance with established intervals of time or number of units of use butwithout previous condition investigation [1] TBM is used for failures that areage-related and for which the probability of failure on time can be established
5
Maintenance
Preventive Maintenance
Time-Based Maintenance (TBM) Condition Based Maintenance (CBM)
Continuous Schedulled Inspection Based
Corrective Maintenance
Figure 21 Maintenance Tree based on [1]
2 Condition Based Maintenance is preventive maintenance based on perfor-mance andor parameter monitoring and the subsequent actions [1] PMcorresponds to all the maintenance methods using diagnostic or inspectionsto decide of the maintenance actions Diagnostic methods include the use ofhuman senses (noise visual etc) measurements or tests They can be un-dertaken continuously or during schedulled or requested inspections CBM isoften used for non-age related failures
22 Maintenance Optimization Models
Unexpected failures of a component in a system can lead to expensive CorrectiveMaintenance Preventive Maintenance approaches can be used to avoid CM Ifpreventive maintenance is done too frequently it can however also result in a veryhigh cost
The aim of the maintenance optimization could be to balance corrective and pre-ventive maintenance to minimize for example the total cost of maintenance
Numerous maintenance optimization models have been proposed in the litteratureand interesting reviews have been published Wang [43] gives an interesting pictureof maintenance policy optimization and its influence factors Cho et al [15]Dekker et al [16] and Nicolai et al [31] focus mainly on multi-componentproblems
In this section the most common classes of models are described and some referencesare given This short review is based on Chapter 8 of [4]
6
221 Age Replacement Policies
Under an age replacement policy a component is replace at failure or at the end ofa specified interval whichever occurs first [17] This policy makes sens if preventivereplacement is less expensive than a corrective replacement and the failure rateincrease with time Barlow et al [7] describes a basic age replacement model
A model including discount have been proposed in [17] In this model the loss valueof a replaced component decreases with its age
A model with minimal repair is discussed in [6] If the component fails it can berepaired to the same condition as before the failure occured
An ageblock replacement model with failures resulting from shocks is described in[38] The shocks follows a non-homogeneous Poisson distribution (Poisson processwith a rate that is not stationnary) Two types of failures can result from the shocksminor failure removed by minor repair and major failure removed by replacement
222 Block Replacement Policies
In blocks replacement policies the components of a system are replaced at failureor at fixed times kT (k = 1 2 ) whichever occurs first Barlow et al [7] describesa basic block replacement model To avoid that a component that has just beenreplaced is replaced again a modified block replacement model is proposed in [10]A component is not replaced at a schedulled replacement time if its age is less thanT
This model has been modified in [11] to model that the operational cost of an unitis higher when it becomes older Moreover the model of [10] is extended in [5] toallow multi-component systems with any discrete lifetime distribution
223 Condition Based Maintenance
CBM is being introduced in many systems to avoid unnecessary maintenance andprevent incipient failure In wind turbines condition monitoring is being intro-duced for components like the gear box blades etc [32] One problem prior to theoptimization is to identify relevant variables and identify their relation with failuresmodes and probabilities CBM optimization models focus on different questionsrelated to inspectedmonitored components
One question is the optimal limits for the monitored variables above which it is nec-essary to perform maintenance The optimal wear-limit for preventive replacement
7
of a component is derived in [34] The model is extended in [35] to include differentmonitoring variables
For components subject to inspection at each decision epoch one must decide ifmaintenance should be performed and when the next inspection should occur In[2] the inspection occur at fixed time and the decision of preventive replacementof the component depend on its condition at inspection In [9] a Semi-MarkovDecision Process (SMDP see Chapter 4) is proposed to optimize at each inspectionthe maintenance decision and the time to next inspection
An age replacement policies model that takes into account the information fromcondition based monitoring devices is proposed in [25] A proportional hazardmodel is used to model the effect of the monitored variables The assumption ofa hazard model is that the hazard function is the product of a two functions onedepending on the time and one on the parameters (monitored variables)
224 Opportunistic Maintenance Models
Opportunistics maintenance considers unexpected opportunities of performing pre-ventive maintenance With the failure of a component it is possible to perform PMon other components This could be interesting for offshore wind farms for exampleThe deplacement to the wind farm by boat or helicopter is necessary and can bevery expensive By grouping maintenance actions money could be saved
Haurie et al [19] focus on group preventive replacement policy of m identicalcomponents that are in the same condition Both discrete and continuous time areconsidered and a dynamic programming equation is derived The model is extendedin [26] for m non-identical components
A rolling horizon dynamic programming algorithm is proposed in [45] to take intoaccount the short term information The model can be used for many maintenanceoptimization models
225 Other Types of Models and Criteria of Classifications
Other models integrate the possibility of a limited number of spare parts or a possi-ble choice between different spare part Eg cannibalization models allows the re-useof some components or subcomponents of a system
Other criterias can be used to classify maintenance optimization models The num-ber of components in consideration is important eg multi-components modelsare more interesting in power system The time horizon considered in the model
8
is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used
The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions
9
Chapter 3
Introduction to the Power
System
This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed
31 Power System Presentation
Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents
311 Power System Description
A simple description of the power system include the following main parts
1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated
2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units
11
3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)
4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage
The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition
The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs
312 Maintenance in Power System
The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined
Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])
Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more
12
attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)
The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems
32 Costs
Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows
bull Manpower cost Cost for the maintenance team that performs maintenanceactions
bull Spare part cost The cost of a new component is an important part of themaintenance cost
bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine
bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency
bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure
bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)
33 Main Constraints
Possibles constraints for the maintenance of power system have been identified asfollows
13
bull Manpower The size and availability of the maintenance staff is limited
bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available
bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms
bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time
bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model
bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model
bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model
14
Chapter 4
Introduction to Dynamic
Programming
This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels
41 Introduction
Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system
The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions
In the following important ideas concerning Dynamic Programming are discussed
411 Principle of Optimality
Dynamic programming is a way of decomposing a large problem into subproblems
It can be applied to any problem that observes the principle of optimality
15
An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]
The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions
Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed
412 Deterministic and Stochastic Models
A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made
If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic
Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting
413 Time Horizon
The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons
Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered
Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long
16
414 Decision Time
In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result
Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals
Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here
415 Exact and Approximation Methods
Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)
Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods
Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]
17
42 Deterministic Dynamic Programming
This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem
421 Problem Formulation
The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages
State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)
Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )
Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system
Jlowast0 (X0) = minUk
Nminus1sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
18
422 The Optimality Equation and Value Iteration Algorithm
The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)
Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i
The value iteration algorithm is a direct consequence of the optimality equation
JlowastN (i) = CN (i) foralli isin XN
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
Ulowastk (i) = argminuisinΩU
k(i)
Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
u Decision variableUlowastk (i) Optimal decision action at stage k for state i
lll
The algorithm goes backwards starting from the last stage It stops when k=0
19
423 A Simple Shortest Path Problem Example
Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space
An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered
B E H
A C F I K
D G J
Stage 0 Stage 1 Stage 2 Stage 3 Stage 4
2
4
3
4
62
1
35
2
2
57
3
21
2
4
2
7
The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost
Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K
4231 Problem Formulation
The problem is divided into five stagesn=5 k=01234
State SpaceThe state space is defined for each stage
ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2
ΩX3 = H I J = 0 1 2ΩX4 = K = 0
20
Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector
Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used
ΩUk (i) =
0 1 for i = 00 1 2 for i = 11 2 for i = 2
for k=123
ΩU0 (0) = 0 1 2 for k=0
For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F
Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G
A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro
lowast1 micro
lowastN
Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u
The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states
Objective Function
Jlowast0 (0) = minUkisinΩU
k(Xk)
4sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1
4232 Solution
The value iteration algorithm is used to solve the problem
The algorithm is initiated from the last stage and then iterated backwards until
21
the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited
The solution of the algorithm are given in Appendix A
The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)
22
Chapter 5
Finite Horizon Models
In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended
51 Problem Formulation
Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system
A stochastic dynamic programming model can be formulated as below
State Space
A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable
The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk
Decision Space
At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on
23
the stage u isin ΩUk (i)
Dynamic of the System and Transition Probability
On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)
Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1
The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage
Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)
If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified
P (j u i) = P (Xk+1 = j | Xk = i Uk = u)
In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)
Cost Function
A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage
Ck(j u i) = Ck(xk+1 = j uk = u xk = i)
If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)
A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate
Objective Function
The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem
Jlowast(X0) = minUkisinΩU
k(Xk)ECN (XN ) +
Nminus1sumk=0Ck(Xk+1 Uk Xk)
Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1
24
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
52 Optimality Equation
The optimality equation for stochastic finite horizon DP is
Jlowastk (i) = minuisinΩU
k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)
This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions
Jlowastk (i) = minuisinΩU
k(i)
sum
jisinΩXk+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)
ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function
53 Value Iteration Method
The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system
JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)
While k ge 0 doJlowastk (i) = min
uisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk
Ulowastk (i) = argminuisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN
k larr k minus 1
25
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
Chapter 1
Introduction
11 Background
The market and competition laws are introduced among power system companiesdue to the restructuration and deregulation of modern power system The gen-erating companies as well as transmission and distribution system operators aimto minimize their costs Maintenance costs can be a significant part of the totalcosts The pressure to reduce the maintenance budget leads to a need for efficientmaintenance
Maintenance cost be divided into Corrective Maintenance (CM) and PreventiveMaintenance (PM) (see Chapter 21)
CM means that an asset is maintained once an unscheduled functionnal failureoccurs CM can imply high costs for unsupplied energy interruption possible de-terioration of the system human risks or environment consequences etc
PM is employed to reduce the risk of unexpected failure Time Based Maintenance(TBM) is used for the most critical components and Condition Based Maintenance(CBM) for the components that are worth and not too expensive to monitoreThese maintenance actions have a cost for unsupplied energy inspection repairreplacement etc
An efficient maintenance should balance the corrective and preventive maintenanceto minimize the total costs of maintenance
The probability of a functionnal failure for a component is stochastic The probabil-ity depends on the state of component resulting from the history of the component(age intensity of use external stress (such as weather) maintenance actions human
1
errors and construction errors) Stochastic Dynamic Programming (SDP) modelsare optimization models that integrate explicitely stochastic behaviors This featuremakes the models interesting and was the starting idea of this work
12 Objective
The main objective of this work is to investigate the use of stochastic dynamicprogramming models for maintenance optimization and identify possible future ap-plications in power systems
13 Approach
The first task was to understand the different dynamic programming approachesA first distinction was made between finite horizon and infinite horizon approaches
The different techniques that can be used for solving a model based on dynamicprogramming was investigated For infinite horizon models approximate dynamicprogramming was studied These types of methods are related to the field of rein-forcement learning
Some SDP models found in the literature was reviewed Conclusions was madeabout the applicability of each approach for maintenance optimization problemsMoreover future avenue for research was identified
A finite horizon replacement model was developed to illustrate the possible use ofSDP for power system maintenance
14 Outline
Chapter 2 solves an overview of the maintenance field The most important methodsand some optimization models are reviewed
Chapter 3 discusses shortly power systems Some costs and constraints for opti-mization models are proposed
Chapter 4-7 focus on different Dynamic Programming (DP) approaches and al-gorithms to solve them The assumption of the models and practical limitationsare discussed The basic of DP models is investigated in deterministic models inChapter 4 Chapter 5 and 6 focus on Stochastic Dynamic Programming methods
2
respectively for finite and infinite horizons Chapter 7 is an introduction to Approx-imate Dynamic Programming (ADP) also known as Reinforcement Learning (RL)which is an approach to solving Dynamic Programming infinite horizon problemsusing approximate methods
Chapter 8 gives a review of some maintenance optimization models based on dy-namic programming Conclusions are made about possible use of the differentapproaches in maintenance optimization
Chapter 9 is an example of how finite horizon dynamic programming can be usedfor maintenance optimization
Chapter 10 summarizes the conlusions of the work and discuss possible avenues forresearch
3
Chapter 2
Maintenance
The context of maintenance optimization is shortly described in this chapter Differ-ent types of maintenance are defined in Section 21 Some maintenance optimizationmodels are reviewed in Section 22
21 Types of Maintenance
Maintenance is a combination of all technical administrative and managerial actionsduring the life cycle of an item intended to retain it or restore it to a state in whichit can perform the required functions [1] Figure 21 shows a general picture of thedifferent types of maintenance
Corrective Maintenance (CM) is carried out after fault recognition and intendedto put an item into a state in which it can perform a required function [1] It istypically performed in case there is no way or it is not worth detecting or preventinga failure
Preventive maintenance aims at undertaking maintenance actions on a componentbefore it fails to eg avoid high cost of replacement power delivery unsuppliedand possible damages of the surrounding of the component One can distinguishbetween two kind of preventive maintenance
1 Time Based Maintenance (TBM) is preventive maintenance carried out inaccordance with established intervals of time or number of units of use butwithout previous condition investigation [1] TBM is used for failures that areage-related and for which the probability of failure on time can be established
5
Maintenance
Preventive Maintenance
Time-Based Maintenance (TBM) Condition Based Maintenance (CBM)
Continuous Schedulled Inspection Based
Corrective Maintenance
Figure 21 Maintenance Tree based on [1]
2 Condition Based Maintenance is preventive maintenance based on perfor-mance andor parameter monitoring and the subsequent actions [1] PMcorresponds to all the maintenance methods using diagnostic or inspectionsto decide of the maintenance actions Diagnostic methods include the use ofhuman senses (noise visual etc) measurements or tests They can be un-dertaken continuously or during schedulled or requested inspections CBM isoften used for non-age related failures
22 Maintenance Optimization Models
Unexpected failures of a component in a system can lead to expensive CorrectiveMaintenance Preventive Maintenance approaches can be used to avoid CM Ifpreventive maintenance is done too frequently it can however also result in a veryhigh cost
The aim of the maintenance optimization could be to balance corrective and pre-ventive maintenance to minimize for example the total cost of maintenance
Numerous maintenance optimization models have been proposed in the litteratureand interesting reviews have been published Wang [43] gives an interesting pictureof maintenance policy optimization and its influence factors Cho et al [15]Dekker et al [16] and Nicolai et al [31] focus mainly on multi-componentproblems
In this section the most common classes of models are described and some referencesare given This short review is based on Chapter 8 of [4]
6
221 Age Replacement Policies
Under an age replacement policy a component is replace at failure or at the end ofa specified interval whichever occurs first [17] This policy makes sens if preventivereplacement is less expensive than a corrective replacement and the failure rateincrease with time Barlow et al [7] describes a basic age replacement model
A model including discount have been proposed in [17] In this model the loss valueof a replaced component decreases with its age
A model with minimal repair is discussed in [6] If the component fails it can berepaired to the same condition as before the failure occured
An ageblock replacement model with failures resulting from shocks is described in[38] The shocks follows a non-homogeneous Poisson distribution (Poisson processwith a rate that is not stationnary) Two types of failures can result from the shocksminor failure removed by minor repair and major failure removed by replacement
222 Block Replacement Policies
In blocks replacement policies the components of a system are replaced at failureor at fixed times kT (k = 1 2 ) whichever occurs first Barlow et al [7] describesa basic block replacement model To avoid that a component that has just beenreplaced is replaced again a modified block replacement model is proposed in [10]A component is not replaced at a schedulled replacement time if its age is less thanT
This model has been modified in [11] to model that the operational cost of an unitis higher when it becomes older Moreover the model of [10] is extended in [5] toallow multi-component systems with any discrete lifetime distribution
223 Condition Based Maintenance
CBM is being introduced in many systems to avoid unnecessary maintenance andprevent incipient failure In wind turbines condition monitoring is being intro-duced for components like the gear box blades etc [32] One problem prior to theoptimization is to identify relevant variables and identify their relation with failuresmodes and probabilities CBM optimization models focus on different questionsrelated to inspectedmonitored components
One question is the optimal limits for the monitored variables above which it is nec-essary to perform maintenance The optimal wear-limit for preventive replacement
7
of a component is derived in [34] The model is extended in [35] to include differentmonitoring variables
For components subject to inspection at each decision epoch one must decide ifmaintenance should be performed and when the next inspection should occur In[2] the inspection occur at fixed time and the decision of preventive replacementof the component depend on its condition at inspection In [9] a Semi-MarkovDecision Process (SMDP see Chapter 4) is proposed to optimize at each inspectionthe maintenance decision and the time to next inspection
An age replacement policies model that takes into account the information fromcondition based monitoring devices is proposed in [25] A proportional hazardmodel is used to model the effect of the monitored variables The assumption ofa hazard model is that the hazard function is the product of a two functions onedepending on the time and one on the parameters (monitored variables)
224 Opportunistic Maintenance Models
Opportunistics maintenance considers unexpected opportunities of performing pre-ventive maintenance With the failure of a component it is possible to perform PMon other components This could be interesting for offshore wind farms for exampleThe deplacement to the wind farm by boat or helicopter is necessary and can bevery expensive By grouping maintenance actions money could be saved
Haurie et al [19] focus on group preventive replacement policy of m identicalcomponents that are in the same condition Both discrete and continuous time areconsidered and a dynamic programming equation is derived The model is extendedin [26] for m non-identical components
A rolling horizon dynamic programming algorithm is proposed in [45] to take intoaccount the short term information The model can be used for many maintenanceoptimization models
225 Other Types of Models and Criteria of Classifications
Other models integrate the possibility of a limited number of spare parts or a possi-ble choice between different spare part Eg cannibalization models allows the re-useof some components or subcomponents of a system
Other criterias can be used to classify maintenance optimization models The num-ber of components in consideration is important eg multi-components modelsare more interesting in power system The time horizon considered in the model
8
is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used
The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions
9
Chapter 3
Introduction to the Power
System
This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed
31 Power System Presentation
Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents
311 Power System Description
A simple description of the power system include the following main parts
1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated
2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units
11
3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)
4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage
The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition
The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs
312 Maintenance in Power System
The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined
Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])
Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more
12
attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)
The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems
32 Costs
Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows
bull Manpower cost Cost for the maintenance team that performs maintenanceactions
bull Spare part cost The cost of a new component is an important part of themaintenance cost
bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine
bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency
bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure
bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)
33 Main Constraints
Possibles constraints for the maintenance of power system have been identified asfollows
13
bull Manpower The size and availability of the maintenance staff is limited
bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available
bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms
bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time
bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model
bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model
bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model
14
Chapter 4
Introduction to Dynamic
Programming
This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels
41 Introduction
Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system
The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions
In the following important ideas concerning Dynamic Programming are discussed
411 Principle of Optimality
Dynamic programming is a way of decomposing a large problem into subproblems
It can be applied to any problem that observes the principle of optimality
15
An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]
The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions
Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed
412 Deterministic and Stochastic Models
A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made
If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic
Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting
413 Time Horizon
The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons
Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered
Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long
16
414 Decision Time
In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result
Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals
Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here
415 Exact and Approximation Methods
Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)
Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods
Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]
17
42 Deterministic Dynamic Programming
This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem
421 Problem Formulation
The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages
State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)
Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )
Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system
Jlowast0 (X0) = minUk
Nminus1sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
18
422 The Optimality Equation and Value Iteration Algorithm
The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)
Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i
The value iteration algorithm is a direct consequence of the optimality equation
JlowastN (i) = CN (i) foralli isin XN
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
Ulowastk (i) = argminuisinΩU
k(i)
Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
u Decision variableUlowastk (i) Optimal decision action at stage k for state i
lll
The algorithm goes backwards starting from the last stage It stops when k=0
19
423 A Simple Shortest Path Problem Example
Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space
An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered
B E H
A C F I K
D G J
Stage 0 Stage 1 Stage 2 Stage 3 Stage 4
2
4
3
4
62
1
35
2
2
57
3
21
2
4
2
7
The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost
Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K
4231 Problem Formulation
The problem is divided into five stagesn=5 k=01234
State SpaceThe state space is defined for each stage
ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2
ΩX3 = H I J = 0 1 2ΩX4 = K = 0
20
Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector
Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used
ΩUk (i) =
0 1 for i = 00 1 2 for i = 11 2 for i = 2
for k=123
ΩU0 (0) = 0 1 2 for k=0
For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F
Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G
A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro
lowast1 micro
lowastN
Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u
The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states
Objective Function
Jlowast0 (0) = minUkisinΩU
k(Xk)
4sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1
4232 Solution
The value iteration algorithm is used to solve the problem
The algorithm is initiated from the last stage and then iterated backwards until
21
the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited
The solution of the algorithm are given in Appendix A
The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)
22
Chapter 5
Finite Horizon Models
In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended
51 Problem Formulation
Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system
A stochastic dynamic programming model can be formulated as below
State Space
A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable
The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk
Decision Space
At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on
23
the stage u isin ΩUk (i)
Dynamic of the System and Transition Probability
On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)
Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1
The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage
Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)
If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified
P (j u i) = P (Xk+1 = j | Xk = i Uk = u)
In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)
Cost Function
A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage
Ck(j u i) = Ck(xk+1 = j uk = u xk = i)
If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)
A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate
Objective Function
The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem
Jlowast(X0) = minUkisinΩU
k(Xk)ECN (XN ) +
Nminus1sumk=0Ck(Xk+1 Uk Xk)
Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1
24
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
52 Optimality Equation
The optimality equation for stochastic finite horizon DP is
Jlowastk (i) = minuisinΩU
k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)
This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions
Jlowastk (i) = minuisinΩU
k(i)
sum
jisinΩXk+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)
ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function
53 Value Iteration Method
The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system
JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)
While k ge 0 doJlowastk (i) = min
uisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk
Ulowastk (i) = argminuisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN
k larr k minus 1
25
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
errors and construction errors) Stochastic Dynamic Programming (SDP) modelsare optimization models that integrate explicitely stochastic behaviors This featuremakes the models interesting and was the starting idea of this work
12 Objective
The main objective of this work is to investigate the use of stochastic dynamicprogramming models for maintenance optimization and identify possible future ap-plications in power systems
13 Approach
The first task was to understand the different dynamic programming approachesA first distinction was made between finite horizon and infinite horizon approaches
The different techniques that can be used for solving a model based on dynamicprogramming was investigated For infinite horizon models approximate dynamicprogramming was studied These types of methods are related to the field of rein-forcement learning
Some SDP models found in the literature was reviewed Conclusions was madeabout the applicability of each approach for maintenance optimization problemsMoreover future avenue for research was identified
A finite horizon replacement model was developed to illustrate the possible use ofSDP for power system maintenance
14 Outline
Chapter 2 solves an overview of the maintenance field The most important methodsand some optimization models are reviewed
Chapter 3 discusses shortly power systems Some costs and constraints for opti-mization models are proposed
Chapter 4-7 focus on different Dynamic Programming (DP) approaches and al-gorithms to solve them The assumption of the models and practical limitationsare discussed The basic of DP models is investigated in deterministic models inChapter 4 Chapter 5 and 6 focus on Stochastic Dynamic Programming methods
2
respectively for finite and infinite horizons Chapter 7 is an introduction to Approx-imate Dynamic Programming (ADP) also known as Reinforcement Learning (RL)which is an approach to solving Dynamic Programming infinite horizon problemsusing approximate methods
Chapter 8 gives a review of some maintenance optimization models based on dy-namic programming Conclusions are made about possible use of the differentapproaches in maintenance optimization
Chapter 9 is an example of how finite horizon dynamic programming can be usedfor maintenance optimization
Chapter 10 summarizes the conlusions of the work and discuss possible avenues forresearch
3
Chapter 2
Maintenance
The context of maintenance optimization is shortly described in this chapter Differ-ent types of maintenance are defined in Section 21 Some maintenance optimizationmodels are reviewed in Section 22
21 Types of Maintenance
Maintenance is a combination of all technical administrative and managerial actionsduring the life cycle of an item intended to retain it or restore it to a state in whichit can perform the required functions [1] Figure 21 shows a general picture of thedifferent types of maintenance
Corrective Maintenance (CM) is carried out after fault recognition and intendedto put an item into a state in which it can perform a required function [1] It istypically performed in case there is no way or it is not worth detecting or preventinga failure
Preventive maintenance aims at undertaking maintenance actions on a componentbefore it fails to eg avoid high cost of replacement power delivery unsuppliedand possible damages of the surrounding of the component One can distinguishbetween two kind of preventive maintenance
1 Time Based Maintenance (TBM) is preventive maintenance carried out inaccordance with established intervals of time or number of units of use butwithout previous condition investigation [1] TBM is used for failures that areage-related and for which the probability of failure on time can be established
5
Maintenance
Preventive Maintenance
Time-Based Maintenance (TBM) Condition Based Maintenance (CBM)
Continuous Schedulled Inspection Based
Corrective Maintenance
Figure 21 Maintenance Tree based on [1]
2 Condition Based Maintenance is preventive maintenance based on perfor-mance andor parameter monitoring and the subsequent actions [1] PMcorresponds to all the maintenance methods using diagnostic or inspectionsto decide of the maintenance actions Diagnostic methods include the use ofhuman senses (noise visual etc) measurements or tests They can be un-dertaken continuously or during schedulled or requested inspections CBM isoften used for non-age related failures
22 Maintenance Optimization Models
Unexpected failures of a component in a system can lead to expensive CorrectiveMaintenance Preventive Maintenance approaches can be used to avoid CM Ifpreventive maintenance is done too frequently it can however also result in a veryhigh cost
The aim of the maintenance optimization could be to balance corrective and pre-ventive maintenance to minimize for example the total cost of maintenance
Numerous maintenance optimization models have been proposed in the litteratureand interesting reviews have been published Wang [43] gives an interesting pictureof maintenance policy optimization and its influence factors Cho et al [15]Dekker et al [16] and Nicolai et al [31] focus mainly on multi-componentproblems
In this section the most common classes of models are described and some referencesare given This short review is based on Chapter 8 of [4]
6
221 Age Replacement Policies
Under an age replacement policy a component is replace at failure or at the end ofa specified interval whichever occurs first [17] This policy makes sens if preventivereplacement is less expensive than a corrective replacement and the failure rateincrease with time Barlow et al [7] describes a basic age replacement model
A model including discount have been proposed in [17] In this model the loss valueof a replaced component decreases with its age
A model with minimal repair is discussed in [6] If the component fails it can berepaired to the same condition as before the failure occured
An ageblock replacement model with failures resulting from shocks is described in[38] The shocks follows a non-homogeneous Poisson distribution (Poisson processwith a rate that is not stationnary) Two types of failures can result from the shocksminor failure removed by minor repair and major failure removed by replacement
222 Block Replacement Policies
In blocks replacement policies the components of a system are replaced at failureor at fixed times kT (k = 1 2 ) whichever occurs first Barlow et al [7] describesa basic block replacement model To avoid that a component that has just beenreplaced is replaced again a modified block replacement model is proposed in [10]A component is not replaced at a schedulled replacement time if its age is less thanT
This model has been modified in [11] to model that the operational cost of an unitis higher when it becomes older Moreover the model of [10] is extended in [5] toallow multi-component systems with any discrete lifetime distribution
223 Condition Based Maintenance
CBM is being introduced in many systems to avoid unnecessary maintenance andprevent incipient failure In wind turbines condition monitoring is being intro-duced for components like the gear box blades etc [32] One problem prior to theoptimization is to identify relevant variables and identify their relation with failuresmodes and probabilities CBM optimization models focus on different questionsrelated to inspectedmonitored components
One question is the optimal limits for the monitored variables above which it is nec-essary to perform maintenance The optimal wear-limit for preventive replacement
7
of a component is derived in [34] The model is extended in [35] to include differentmonitoring variables
For components subject to inspection at each decision epoch one must decide ifmaintenance should be performed and when the next inspection should occur In[2] the inspection occur at fixed time and the decision of preventive replacementof the component depend on its condition at inspection In [9] a Semi-MarkovDecision Process (SMDP see Chapter 4) is proposed to optimize at each inspectionthe maintenance decision and the time to next inspection
An age replacement policies model that takes into account the information fromcondition based monitoring devices is proposed in [25] A proportional hazardmodel is used to model the effect of the monitored variables The assumption ofa hazard model is that the hazard function is the product of a two functions onedepending on the time and one on the parameters (monitored variables)
224 Opportunistic Maintenance Models
Opportunistics maintenance considers unexpected opportunities of performing pre-ventive maintenance With the failure of a component it is possible to perform PMon other components This could be interesting for offshore wind farms for exampleThe deplacement to the wind farm by boat or helicopter is necessary and can bevery expensive By grouping maintenance actions money could be saved
Haurie et al [19] focus on group preventive replacement policy of m identicalcomponents that are in the same condition Both discrete and continuous time areconsidered and a dynamic programming equation is derived The model is extendedin [26] for m non-identical components
A rolling horizon dynamic programming algorithm is proposed in [45] to take intoaccount the short term information The model can be used for many maintenanceoptimization models
225 Other Types of Models and Criteria of Classifications
Other models integrate the possibility of a limited number of spare parts or a possi-ble choice between different spare part Eg cannibalization models allows the re-useof some components or subcomponents of a system
Other criterias can be used to classify maintenance optimization models The num-ber of components in consideration is important eg multi-components modelsare more interesting in power system The time horizon considered in the model
8
is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used
The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions
9
Chapter 3
Introduction to the Power
System
This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed
31 Power System Presentation
Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents
311 Power System Description
A simple description of the power system include the following main parts
1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated
2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units
11
3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)
4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage
The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition
The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs
312 Maintenance in Power System
The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined
Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])
Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more
12
attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)
The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems
32 Costs
Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows
bull Manpower cost Cost for the maintenance team that performs maintenanceactions
bull Spare part cost The cost of a new component is an important part of themaintenance cost
bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine
bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency
bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure
bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)
33 Main Constraints
Possibles constraints for the maintenance of power system have been identified asfollows
13
bull Manpower The size and availability of the maintenance staff is limited
bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available
bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms
bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time
bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model
bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model
bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model
14
Chapter 4
Introduction to Dynamic
Programming
This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels
41 Introduction
Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system
The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions
In the following important ideas concerning Dynamic Programming are discussed
411 Principle of Optimality
Dynamic programming is a way of decomposing a large problem into subproblems
It can be applied to any problem that observes the principle of optimality
15
An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]
The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions
Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed
412 Deterministic and Stochastic Models
A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made
If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic
Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting
413 Time Horizon
The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons
Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered
Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long
16
414 Decision Time
In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result
Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals
Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here
415 Exact and Approximation Methods
Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)
Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods
Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]
17
42 Deterministic Dynamic Programming
This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem
421 Problem Formulation
The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages
State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)
Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )
Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system
Jlowast0 (X0) = minUk
Nminus1sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
18
422 The Optimality Equation and Value Iteration Algorithm
The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)
Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i
The value iteration algorithm is a direct consequence of the optimality equation
JlowastN (i) = CN (i) foralli isin XN
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
Ulowastk (i) = argminuisinΩU
k(i)
Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
u Decision variableUlowastk (i) Optimal decision action at stage k for state i
lll
The algorithm goes backwards starting from the last stage It stops when k=0
19
423 A Simple Shortest Path Problem Example
Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space
An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered
B E H
A C F I K
D G J
Stage 0 Stage 1 Stage 2 Stage 3 Stage 4
2
4
3
4
62
1
35
2
2
57
3
21
2
4
2
7
The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost
Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K
4231 Problem Formulation
The problem is divided into five stagesn=5 k=01234
State SpaceThe state space is defined for each stage
ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2
ΩX3 = H I J = 0 1 2ΩX4 = K = 0
20
Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector
Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used
ΩUk (i) =
0 1 for i = 00 1 2 for i = 11 2 for i = 2
for k=123
ΩU0 (0) = 0 1 2 for k=0
For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F
Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G
A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro
lowast1 micro
lowastN
Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u
The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states
Objective Function
Jlowast0 (0) = minUkisinΩU
k(Xk)
4sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1
4232 Solution
The value iteration algorithm is used to solve the problem
The algorithm is initiated from the last stage and then iterated backwards until
21
the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited
The solution of the algorithm are given in Appendix A
The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)
22
Chapter 5
Finite Horizon Models
In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended
51 Problem Formulation
Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system
A stochastic dynamic programming model can be formulated as below
State Space
A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable
The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk
Decision Space
At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on
23
the stage u isin ΩUk (i)
Dynamic of the System and Transition Probability
On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)
Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1
The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage
Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)
If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified
P (j u i) = P (Xk+1 = j | Xk = i Uk = u)
In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)
Cost Function
A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage
Ck(j u i) = Ck(xk+1 = j uk = u xk = i)
If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)
A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate
Objective Function
The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem
Jlowast(X0) = minUkisinΩU
k(Xk)ECN (XN ) +
Nminus1sumk=0Ck(Xk+1 Uk Xk)
Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1
24
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
52 Optimality Equation
The optimality equation for stochastic finite horizon DP is
Jlowastk (i) = minuisinΩU
k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)
This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions
Jlowastk (i) = minuisinΩU
k(i)
sum
jisinΩXk+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)
ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function
53 Value Iteration Method
The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system
JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)
While k ge 0 doJlowastk (i) = min
uisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk
Ulowastk (i) = argminuisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN
k larr k minus 1
25
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
respectively for finite and infinite horizons Chapter 7 is an introduction to Approx-imate Dynamic Programming (ADP) also known as Reinforcement Learning (RL)which is an approach to solving Dynamic Programming infinite horizon problemsusing approximate methods
Chapter 8 gives a review of some maintenance optimization models based on dy-namic programming Conclusions are made about possible use of the differentapproaches in maintenance optimization
Chapter 9 is an example of how finite horizon dynamic programming can be usedfor maintenance optimization
Chapter 10 summarizes the conlusions of the work and discuss possible avenues forresearch
3
Chapter 2
Maintenance
The context of maintenance optimization is shortly described in this chapter Differ-ent types of maintenance are defined in Section 21 Some maintenance optimizationmodels are reviewed in Section 22
21 Types of Maintenance
Maintenance is a combination of all technical administrative and managerial actionsduring the life cycle of an item intended to retain it or restore it to a state in whichit can perform the required functions [1] Figure 21 shows a general picture of thedifferent types of maintenance
Corrective Maintenance (CM) is carried out after fault recognition and intendedto put an item into a state in which it can perform a required function [1] It istypically performed in case there is no way or it is not worth detecting or preventinga failure
Preventive maintenance aims at undertaking maintenance actions on a componentbefore it fails to eg avoid high cost of replacement power delivery unsuppliedand possible damages of the surrounding of the component One can distinguishbetween two kind of preventive maintenance
1 Time Based Maintenance (TBM) is preventive maintenance carried out inaccordance with established intervals of time or number of units of use butwithout previous condition investigation [1] TBM is used for failures that areage-related and for which the probability of failure on time can be established
5
Maintenance
Preventive Maintenance
Time-Based Maintenance (TBM) Condition Based Maintenance (CBM)
Continuous Schedulled Inspection Based
Corrective Maintenance
Figure 21 Maintenance Tree based on [1]
2 Condition Based Maintenance is preventive maintenance based on perfor-mance andor parameter monitoring and the subsequent actions [1] PMcorresponds to all the maintenance methods using diagnostic or inspectionsto decide of the maintenance actions Diagnostic methods include the use ofhuman senses (noise visual etc) measurements or tests They can be un-dertaken continuously or during schedulled or requested inspections CBM isoften used for non-age related failures
22 Maintenance Optimization Models
Unexpected failures of a component in a system can lead to expensive CorrectiveMaintenance Preventive Maintenance approaches can be used to avoid CM Ifpreventive maintenance is done too frequently it can however also result in a veryhigh cost
The aim of the maintenance optimization could be to balance corrective and pre-ventive maintenance to minimize for example the total cost of maintenance
Numerous maintenance optimization models have been proposed in the litteratureand interesting reviews have been published Wang [43] gives an interesting pictureof maintenance policy optimization and its influence factors Cho et al [15]Dekker et al [16] and Nicolai et al [31] focus mainly on multi-componentproblems
In this section the most common classes of models are described and some referencesare given This short review is based on Chapter 8 of [4]
6
221 Age Replacement Policies
Under an age replacement policy a component is replace at failure or at the end ofa specified interval whichever occurs first [17] This policy makes sens if preventivereplacement is less expensive than a corrective replacement and the failure rateincrease with time Barlow et al [7] describes a basic age replacement model
A model including discount have been proposed in [17] In this model the loss valueof a replaced component decreases with its age
A model with minimal repair is discussed in [6] If the component fails it can berepaired to the same condition as before the failure occured
An ageblock replacement model with failures resulting from shocks is described in[38] The shocks follows a non-homogeneous Poisson distribution (Poisson processwith a rate that is not stationnary) Two types of failures can result from the shocksminor failure removed by minor repair and major failure removed by replacement
222 Block Replacement Policies
In blocks replacement policies the components of a system are replaced at failureor at fixed times kT (k = 1 2 ) whichever occurs first Barlow et al [7] describesa basic block replacement model To avoid that a component that has just beenreplaced is replaced again a modified block replacement model is proposed in [10]A component is not replaced at a schedulled replacement time if its age is less thanT
This model has been modified in [11] to model that the operational cost of an unitis higher when it becomes older Moreover the model of [10] is extended in [5] toallow multi-component systems with any discrete lifetime distribution
223 Condition Based Maintenance
CBM is being introduced in many systems to avoid unnecessary maintenance andprevent incipient failure In wind turbines condition monitoring is being intro-duced for components like the gear box blades etc [32] One problem prior to theoptimization is to identify relevant variables and identify their relation with failuresmodes and probabilities CBM optimization models focus on different questionsrelated to inspectedmonitored components
One question is the optimal limits for the monitored variables above which it is nec-essary to perform maintenance The optimal wear-limit for preventive replacement
7
of a component is derived in [34] The model is extended in [35] to include differentmonitoring variables
For components subject to inspection at each decision epoch one must decide ifmaintenance should be performed and when the next inspection should occur In[2] the inspection occur at fixed time and the decision of preventive replacementof the component depend on its condition at inspection In [9] a Semi-MarkovDecision Process (SMDP see Chapter 4) is proposed to optimize at each inspectionthe maintenance decision and the time to next inspection
An age replacement policies model that takes into account the information fromcondition based monitoring devices is proposed in [25] A proportional hazardmodel is used to model the effect of the monitored variables The assumption ofa hazard model is that the hazard function is the product of a two functions onedepending on the time and one on the parameters (monitored variables)
224 Opportunistic Maintenance Models
Opportunistics maintenance considers unexpected opportunities of performing pre-ventive maintenance With the failure of a component it is possible to perform PMon other components This could be interesting for offshore wind farms for exampleThe deplacement to the wind farm by boat or helicopter is necessary and can bevery expensive By grouping maintenance actions money could be saved
Haurie et al [19] focus on group preventive replacement policy of m identicalcomponents that are in the same condition Both discrete and continuous time areconsidered and a dynamic programming equation is derived The model is extendedin [26] for m non-identical components
A rolling horizon dynamic programming algorithm is proposed in [45] to take intoaccount the short term information The model can be used for many maintenanceoptimization models
225 Other Types of Models and Criteria of Classifications
Other models integrate the possibility of a limited number of spare parts or a possi-ble choice between different spare part Eg cannibalization models allows the re-useof some components or subcomponents of a system
Other criterias can be used to classify maintenance optimization models The num-ber of components in consideration is important eg multi-components modelsare more interesting in power system The time horizon considered in the model
8
is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used
The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions
9
Chapter 3
Introduction to the Power
System
This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed
31 Power System Presentation
Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents
311 Power System Description
A simple description of the power system include the following main parts
1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated
2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units
11
3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)
4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage
The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition
The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs
312 Maintenance in Power System
The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined
Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])
Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more
12
attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)
The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems
32 Costs
Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows
bull Manpower cost Cost for the maintenance team that performs maintenanceactions
bull Spare part cost The cost of a new component is an important part of themaintenance cost
bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine
bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency
bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure
bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)
33 Main Constraints
Possibles constraints for the maintenance of power system have been identified asfollows
13
bull Manpower The size and availability of the maintenance staff is limited
bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available
bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms
bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time
bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model
bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model
bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model
14
Chapter 4
Introduction to Dynamic
Programming
This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels
41 Introduction
Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system
The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions
In the following important ideas concerning Dynamic Programming are discussed
411 Principle of Optimality
Dynamic programming is a way of decomposing a large problem into subproblems
It can be applied to any problem that observes the principle of optimality
15
An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]
The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions
Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed
412 Deterministic and Stochastic Models
A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made
If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic
Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting
413 Time Horizon
The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons
Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered
Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long
16
414 Decision Time
In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result
Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals
Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here
415 Exact and Approximation Methods
Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)
Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods
Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]
17
42 Deterministic Dynamic Programming
This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem
421 Problem Formulation
The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages
State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)
Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )
Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system
Jlowast0 (X0) = minUk
Nminus1sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
18
422 The Optimality Equation and Value Iteration Algorithm
The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)
Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i
The value iteration algorithm is a direct consequence of the optimality equation
JlowastN (i) = CN (i) foralli isin XN
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
Ulowastk (i) = argminuisinΩU
k(i)
Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
u Decision variableUlowastk (i) Optimal decision action at stage k for state i
lll
The algorithm goes backwards starting from the last stage It stops when k=0
19
423 A Simple Shortest Path Problem Example
Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space
An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered
B E H
A C F I K
D G J
Stage 0 Stage 1 Stage 2 Stage 3 Stage 4
2
4
3
4
62
1
35
2
2
57
3
21
2
4
2
7
The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost
Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K
4231 Problem Formulation
The problem is divided into five stagesn=5 k=01234
State SpaceThe state space is defined for each stage
ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2
ΩX3 = H I J = 0 1 2ΩX4 = K = 0
20
Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector
Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used
ΩUk (i) =
0 1 for i = 00 1 2 for i = 11 2 for i = 2
for k=123
ΩU0 (0) = 0 1 2 for k=0
For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F
Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G
A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro
lowast1 micro
lowastN
Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u
The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states
Objective Function
Jlowast0 (0) = minUkisinΩU
k(Xk)
4sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1
4232 Solution
The value iteration algorithm is used to solve the problem
The algorithm is initiated from the last stage and then iterated backwards until
21
the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited
The solution of the algorithm are given in Appendix A
The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)
22
Chapter 5
Finite Horizon Models
In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended
51 Problem Formulation
Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system
A stochastic dynamic programming model can be formulated as below
State Space
A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable
The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk
Decision Space
At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on
23
the stage u isin ΩUk (i)
Dynamic of the System and Transition Probability
On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)
Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1
The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage
Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)
If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified
P (j u i) = P (Xk+1 = j | Xk = i Uk = u)
In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)
Cost Function
A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage
Ck(j u i) = Ck(xk+1 = j uk = u xk = i)
If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)
A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate
Objective Function
The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem
Jlowast(X0) = minUkisinΩU
k(Xk)ECN (XN ) +
Nminus1sumk=0Ck(Xk+1 Uk Xk)
Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1
24
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
52 Optimality Equation
The optimality equation for stochastic finite horizon DP is
Jlowastk (i) = minuisinΩU
k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)
This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions
Jlowastk (i) = minuisinΩU
k(i)
sum
jisinΩXk+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)
ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function
53 Value Iteration Method
The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system
JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)
While k ge 0 doJlowastk (i) = min
uisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk
Ulowastk (i) = argminuisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN
k larr k minus 1
25
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
Chapter 2
Maintenance
The context of maintenance optimization is shortly described in this chapter Differ-ent types of maintenance are defined in Section 21 Some maintenance optimizationmodels are reviewed in Section 22
21 Types of Maintenance
Maintenance is a combination of all technical administrative and managerial actionsduring the life cycle of an item intended to retain it or restore it to a state in whichit can perform the required functions [1] Figure 21 shows a general picture of thedifferent types of maintenance
Corrective Maintenance (CM) is carried out after fault recognition and intendedto put an item into a state in which it can perform a required function [1] It istypically performed in case there is no way or it is not worth detecting or preventinga failure
Preventive maintenance aims at undertaking maintenance actions on a componentbefore it fails to eg avoid high cost of replacement power delivery unsuppliedand possible damages of the surrounding of the component One can distinguishbetween two kind of preventive maintenance
1 Time Based Maintenance (TBM) is preventive maintenance carried out inaccordance with established intervals of time or number of units of use butwithout previous condition investigation [1] TBM is used for failures that areage-related and for which the probability of failure on time can be established
5
Maintenance
Preventive Maintenance
Time-Based Maintenance (TBM) Condition Based Maintenance (CBM)
Continuous Schedulled Inspection Based
Corrective Maintenance
Figure 21 Maintenance Tree based on [1]
2 Condition Based Maintenance is preventive maintenance based on perfor-mance andor parameter monitoring and the subsequent actions [1] PMcorresponds to all the maintenance methods using diagnostic or inspectionsto decide of the maintenance actions Diagnostic methods include the use ofhuman senses (noise visual etc) measurements or tests They can be un-dertaken continuously or during schedulled or requested inspections CBM isoften used for non-age related failures
22 Maintenance Optimization Models
Unexpected failures of a component in a system can lead to expensive CorrectiveMaintenance Preventive Maintenance approaches can be used to avoid CM Ifpreventive maintenance is done too frequently it can however also result in a veryhigh cost
The aim of the maintenance optimization could be to balance corrective and pre-ventive maintenance to minimize for example the total cost of maintenance
Numerous maintenance optimization models have been proposed in the litteratureand interesting reviews have been published Wang [43] gives an interesting pictureof maintenance policy optimization and its influence factors Cho et al [15]Dekker et al [16] and Nicolai et al [31] focus mainly on multi-componentproblems
In this section the most common classes of models are described and some referencesare given This short review is based on Chapter 8 of [4]
6
221 Age Replacement Policies
Under an age replacement policy a component is replace at failure or at the end ofa specified interval whichever occurs first [17] This policy makes sens if preventivereplacement is less expensive than a corrective replacement and the failure rateincrease with time Barlow et al [7] describes a basic age replacement model
A model including discount have been proposed in [17] In this model the loss valueof a replaced component decreases with its age
A model with minimal repair is discussed in [6] If the component fails it can berepaired to the same condition as before the failure occured
An ageblock replacement model with failures resulting from shocks is described in[38] The shocks follows a non-homogeneous Poisson distribution (Poisson processwith a rate that is not stationnary) Two types of failures can result from the shocksminor failure removed by minor repair and major failure removed by replacement
222 Block Replacement Policies
In blocks replacement policies the components of a system are replaced at failureor at fixed times kT (k = 1 2 ) whichever occurs first Barlow et al [7] describesa basic block replacement model To avoid that a component that has just beenreplaced is replaced again a modified block replacement model is proposed in [10]A component is not replaced at a schedulled replacement time if its age is less thanT
This model has been modified in [11] to model that the operational cost of an unitis higher when it becomes older Moreover the model of [10] is extended in [5] toallow multi-component systems with any discrete lifetime distribution
223 Condition Based Maintenance
CBM is being introduced in many systems to avoid unnecessary maintenance andprevent incipient failure In wind turbines condition monitoring is being intro-duced for components like the gear box blades etc [32] One problem prior to theoptimization is to identify relevant variables and identify their relation with failuresmodes and probabilities CBM optimization models focus on different questionsrelated to inspectedmonitored components
One question is the optimal limits for the monitored variables above which it is nec-essary to perform maintenance The optimal wear-limit for preventive replacement
7
of a component is derived in [34] The model is extended in [35] to include differentmonitoring variables
For components subject to inspection at each decision epoch one must decide ifmaintenance should be performed and when the next inspection should occur In[2] the inspection occur at fixed time and the decision of preventive replacementof the component depend on its condition at inspection In [9] a Semi-MarkovDecision Process (SMDP see Chapter 4) is proposed to optimize at each inspectionthe maintenance decision and the time to next inspection
An age replacement policies model that takes into account the information fromcondition based monitoring devices is proposed in [25] A proportional hazardmodel is used to model the effect of the monitored variables The assumption ofa hazard model is that the hazard function is the product of a two functions onedepending on the time and one on the parameters (monitored variables)
224 Opportunistic Maintenance Models
Opportunistics maintenance considers unexpected opportunities of performing pre-ventive maintenance With the failure of a component it is possible to perform PMon other components This could be interesting for offshore wind farms for exampleThe deplacement to the wind farm by boat or helicopter is necessary and can bevery expensive By grouping maintenance actions money could be saved
Haurie et al [19] focus on group preventive replacement policy of m identicalcomponents that are in the same condition Both discrete and continuous time areconsidered and a dynamic programming equation is derived The model is extendedin [26] for m non-identical components
A rolling horizon dynamic programming algorithm is proposed in [45] to take intoaccount the short term information The model can be used for many maintenanceoptimization models
225 Other Types of Models and Criteria of Classifications
Other models integrate the possibility of a limited number of spare parts or a possi-ble choice between different spare part Eg cannibalization models allows the re-useof some components or subcomponents of a system
Other criterias can be used to classify maintenance optimization models The num-ber of components in consideration is important eg multi-components modelsare more interesting in power system The time horizon considered in the model
8
is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used
The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions
9
Chapter 3
Introduction to the Power
System
This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed
31 Power System Presentation
Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents
311 Power System Description
A simple description of the power system include the following main parts
1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated
2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units
11
3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)
4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage
The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition
The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs
312 Maintenance in Power System
The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined
Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])
Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more
12
attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)
The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems
32 Costs
Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows
bull Manpower cost Cost for the maintenance team that performs maintenanceactions
bull Spare part cost The cost of a new component is an important part of themaintenance cost
bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine
bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency
bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure
bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)
33 Main Constraints
Possibles constraints for the maintenance of power system have been identified asfollows
13
bull Manpower The size and availability of the maintenance staff is limited
bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available
bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms
bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time
bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model
bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model
bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model
14
Chapter 4
Introduction to Dynamic
Programming
This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels
41 Introduction
Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system
The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions
In the following important ideas concerning Dynamic Programming are discussed
411 Principle of Optimality
Dynamic programming is a way of decomposing a large problem into subproblems
It can be applied to any problem that observes the principle of optimality
15
An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]
The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions
Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed
412 Deterministic and Stochastic Models
A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made
If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic
Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting
413 Time Horizon
The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons
Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered
Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long
16
414 Decision Time
In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result
Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals
Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here
415 Exact and Approximation Methods
Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)
Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods
Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]
17
42 Deterministic Dynamic Programming
This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem
421 Problem Formulation
The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages
State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)
Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )
Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system
Jlowast0 (X0) = minUk
Nminus1sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
18
422 The Optimality Equation and Value Iteration Algorithm
The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)
Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i
The value iteration algorithm is a direct consequence of the optimality equation
JlowastN (i) = CN (i) foralli isin XN
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
Ulowastk (i) = argminuisinΩU
k(i)
Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
u Decision variableUlowastk (i) Optimal decision action at stage k for state i
lll
The algorithm goes backwards starting from the last stage It stops when k=0
19
423 A Simple Shortest Path Problem Example
Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space
An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered
B E H
A C F I K
D G J
Stage 0 Stage 1 Stage 2 Stage 3 Stage 4
2
4
3
4
62
1
35
2
2
57
3
21
2
4
2
7
The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost
Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K
4231 Problem Formulation
The problem is divided into five stagesn=5 k=01234
State SpaceThe state space is defined for each stage
ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2
ΩX3 = H I J = 0 1 2ΩX4 = K = 0
20
Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector
Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used
ΩUk (i) =
0 1 for i = 00 1 2 for i = 11 2 for i = 2
for k=123
ΩU0 (0) = 0 1 2 for k=0
For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F
Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G
A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro
lowast1 micro
lowastN
Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u
The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states
Objective Function
Jlowast0 (0) = minUkisinΩU
k(Xk)
4sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1
4232 Solution
The value iteration algorithm is used to solve the problem
The algorithm is initiated from the last stage and then iterated backwards until
21
the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited
The solution of the algorithm are given in Appendix A
The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)
22
Chapter 5
Finite Horizon Models
In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended
51 Problem Formulation
Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system
A stochastic dynamic programming model can be formulated as below
State Space
A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable
The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk
Decision Space
At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on
23
the stage u isin ΩUk (i)
Dynamic of the System and Transition Probability
On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)
Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1
The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage
Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)
If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified
P (j u i) = P (Xk+1 = j | Xk = i Uk = u)
In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)
Cost Function
A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage
Ck(j u i) = Ck(xk+1 = j uk = u xk = i)
If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)
A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate
Objective Function
The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem
Jlowast(X0) = minUkisinΩU
k(Xk)ECN (XN ) +
Nminus1sumk=0Ck(Xk+1 Uk Xk)
Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1
24
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
52 Optimality Equation
The optimality equation for stochastic finite horizon DP is
Jlowastk (i) = minuisinΩU
k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)
This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions
Jlowastk (i) = minuisinΩU
k(i)
sum
jisinΩXk+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)
ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function
53 Value Iteration Method
The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system
JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)
While k ge 0 doJlowastk (i) = min
uisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk
Ulowastk (i) = argminuisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN
k larr k minus 1
25
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
Maintenance
Preventive Maintenance
Time-Based Maintenance (TBM) Condition Based Maintenance (CBM)
Continuous Schedulled Inspection Based
Corrective Maintenance
Figure 21 Maintenance Tree based on [1]
2 Condition Based Maintenance is preventive maintenance based on perfor-mance andor parameter monitoring and the subsequent actions [1] PMcorresponds to all the maintenance methods using diagnostic or inspectionsto decide of the maintenance actions Diagnostic methods include the use ofhuman senses (noise visual etc) measurements or tests They can be un-dertaken continuously or during schedulled or requested inspections CBM isoften used for non-age related failures
22 Maintenance Optimization Models
Unexpected failures of a component in a system can lead to expensive CorrectiveMaintenance Preventive Maintenance approaches can be used to avoid CM Ifpreventive maintenance is done too frequently it can however also result in a veryhigh cost
The aim of the maintenance optimization could be to balance corrective and pre-ventive maintenance to minimize for example the total cost of maintenance
Numerous maintenance optimization models have been proposed in the litteratureand interesting reviews have been published Wang [43] gives an interesting pictureof maintenance policy optimization and its influence factors Cho et al [15]Dekker et al [16] and Nicolai et al [31] focus mainly on multi-componentproblems
In this section the most common classes of models are described and some referencesare given This short review is based on Chapter 8 of [4]
6
221 Age Replacement Policies
Under an age replacement policy a component is replace at failure or at the end ofa specified interval whichever occurs first [17] This policy makes sens if preventivereplacement is less expensive than a corrective replacement and the failure rateincrease with time Barlow et al [7] describes a basic age replacement model
A model including discount have been proposed in [17] In this model the loss valueof a replaced component decreases with its age
A model with minimal repair is discussed in [6] If the component fails it can berepaired to the same condition as before the failure occured
An ageblock replacement model with failures resulting from shocks is described in[38] The shocks follows a non-homogeneous Poisson distribution (Poisson processwith a rate that is not stationnary) Two types of failures can result from the shocksminor failure removed by minor repair and major failure removed by replacement
222 Block Replacement Policies
In blocks replacement policies the components of a system are replaced at failureor at fixed times kT (k = 1 2 ) whichever occurs first Barlow et al [7] describesa basic block replacement model To avoid that a component that has just beenreplaced is replaced again a modified block replacement model is proposed in [10]A component is not replaced at a schedulled replacement time if its age is less thanT
This model has been modified in [11] to model that the operational cost of an unitis higher when it becomes older Moreover the model of [10] is extended in [5] toallow multi-component systems with any discrete lifetime distribution
223 Condition Based Maintenance
CBM is being introduced in many systems to avoid unnecessary maintenance andprevent incipient failure In wind turbines condition monitoring is being intro-duced for components like the gear box blades etc [32] One problem prior to theoptimization is to identify relevant variables and identify their relation with failuresmodes and probabilities CBM optimization models focus on different questionsrelated to inspectedmonitored components
One question is the optimal limits for the monitored variables above which it is nec-essary to perform maintenance The optimal wear-limit for preventive replacement
7
of a component is derived in [34] The model is extended in [35] to include differentmonitoring variables
For components subject to inspection at each decision epoch one must decide ifmaintenance should be performed and when the next inspection should occur In[2] the inspection occur at fixed time and the decision of preventive replacementof the component depend on its condition at inspection In [9] a Semi-MarkovDecision Process (SMDP see Chapter 4) is proposed to optimize at each inspectionthe maintenance decision and the time to next inspection
An age replacement policies model that takes into account the information fromcondition based monitoring devices is proposed in [25] A proportional hazardmodel is used to model the effect of the monitored variables The assumption ofa hazard model is that the hazard function is the product of a two functions onedepending on the time and one on the parameters (monitored variables)
224 Opportunistic Maintenance Models
Opportunistics maintenance considers unexpected opportunities of performing pre-ventive maintenance With the failure of a component it is possible to perform PMon other components This could be interesting for offshore wind farms for exampleThe deplacement to the wind farm by boat or helicopter is necessary and can bevery expensive By grouping maintenance actions money could be saved
Haurie et al [19] focus on group preventive replacement policy of m identicalcomponents that are in the same condition Both discrete and continuous time areconsidered and a dynamic programming equation is derived The model is extendedin [26] for m non-identical components
A rolling horizon dynamic programming algorithm is proposed in [45] to take intoaccount the short term information The model can be used for many maintenanceoptimization models
225 Other Types of Models and Criteria of Classifications
Other models integrate the possibility of a limited number of spare parts or a possi-ble choice between different spare part Eg cannibalization models allows the re-useof some components or subcomponents of a system
Other criterias can be used to classify maintenance optimization models The num-ber of components in consideration is important eg multi-components modelsare more interesting in power system The time horizon considered in the model
8
is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used
The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions
9
Chapter 3
Introduction to the Power
System
This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed
31 Power System Presentation
Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents
311 Power System Description
A simple description of the power system include the following main parts
1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated
2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units
11
3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)
4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage
The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition
The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs
312 Maintenance in Power System
The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined
Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])
Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more
12
attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)
The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems
32 Costs
Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows
bull Manpower cost Cost for the maintenance team that performs maintenanceactions
bull Spare part cost The cost of a new component is an important part of themaintenance cost
bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine
bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency
bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure
bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)
33 Main Constraints
Possibles constraints for the maintenance of power system have been identified asfollows
13
bull Manpower The size and availability of the maintenance staff is limited
bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available
bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms
bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time
bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model
bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model
bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model
14
Chapter 4
Introduction to Dynamic
Programming
This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels
41 Introduction
Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system
The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions
In the following important ideas concerning Dynamic Programming are discussed
411 Principle of Optimality
Dynamic programming is a way of decomposing a large problem into subproblems
It can be applied to any problem that observes the principle of optimality
15
An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]
The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions
Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed
412 Deterministic and Stochastic Models
A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made
If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic
Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting
413 Time Horizon
The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons
Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered
Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long
16
414 Decision Time
In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result
Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals
Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here
415 Exact and Approximation Methods
Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)
Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods
Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]
17
42 Deterministic Dynamic Programming
This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem
421 Problem Formulation
The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages
State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)
Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )
Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system
Jlowast0 (X0) = minUk
Nminus1sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
18
422 The Optimality Equation and Value Iteration Algorithm
The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)
Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i
The value iteration algorithm is a direct consequence of the optimality equation
JlowastN (i) = CN (i) foralli isin XN
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
Ulowastk (i) = argminuisinΩU
k(i)
Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
u Decision variableUlowastk (i) Optimal decision action at stage k for state i
lll
The algorithm goes backwards starting from the last stage It stops when k=0
19
423 A Simple Shortest Path Problem Example
Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space
An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered
B E H
A C F I K
D G J
Stage 0 Stage 1 Stage 2 Stage 3 Stage 4
2
4
3
4
62
1
35
2
2
57
3
21
2
4
2
7
The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost
Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K
4231 Problem Formulation
The problem is divided into five stagesn=5 k=01234
State SpaceThe state space is defined for each stage
ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2
ΩX3 = H I J = 0 1 2ΩX4 = K = 0
20
Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector
Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used
ΩUk (i) =
0 1 for i = 00 1 2 for i = 11 2 for i = 2
for k=123
ΩU0 (0) = 0 1 2 for k=0
For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F
Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G
A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro
lowast1 micro
lowastN
Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u
The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states
Objective Function
Jlowast0 (0) = minUkisinΩU
k(Xk)
4sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1
4232 Solution
The value iteration algorithm is used to solve the problem
The algorithm is initiated from the last stage and then iterated backwards until
21
the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited
The solution of the algorithm are given in Appendix A
The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)
22
Chapter 5
Finite Horizon Models
In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended
51 Problem Formulation
Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system
A stochastic dynamic programming model can be formulated as below
State Space
A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable
The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk
Decision Space
At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on
23
the stage u isin ΩUk (i)
Dynamic of the System and Transition Probability
On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)
Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1
The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage
Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)
If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified
P (j u i) = P (Xk+1 = j | Xk = i Uk = u)
In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)
Cost Function
A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage
Ck(j u i) = Ck(xk+1 = j uk = u xk = i)
If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)
A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate
Objective Function
The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem
Jlowast(X0) = minUkisinΩU
k(Xk)ECN (XN ) +
Nminus1sumk=0Ck(Xk+1 Uk Xk)
Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1
24
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
52 Optimality Equation
The optimality equation for stochastic finite horizon DP is
Jlowastk (i) = minuisinΩU
k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)
This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions
Jlowastk (i) = minuisinΩU
k(i)
sum
jisinΩXk+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)
ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function
53 Value Iteration Method
The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system
JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)
While k ge 0 doJlowastk (i) = min
uisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk
Ulowastk (i) = argminuisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN
k larr k minus 1
25
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
221 Age Replacement Policies
Under an age replacement policy a component is replace at failure or at the end ofa specified interval whichever occurs first [17] This policy makes sens if preventivereplacement is less expensive than a corrective replacement and the failure rateincrease with time Barlow et al [7] describes a basic age replacement model
A model including discount have been proposed in [17] In this model the loss valueof a replaced component decreases with its age
A model with minimal repair is discussed in [6] If the component fails it can berepaired to the same condition as before the failure occured
An ageblock replacement model with failures resulting from shocks is described in[38] The shocks follows a non-homogeneous Poisson distribution (Poisson processwith a rate that is not stationnary) Two types of failures can result from the shocksminor failure removed by minor repair and major failure removed by replacement
222 Block Replacement Policies
In blocks replacement policies the components of a system are replaced at failureor at fixed times kT (k = 1 2 ) whichever occurs first Barlow et al [7] describesa basic block replacement model To avoid that a component that has just beenreplaced is replaced again a modified block replacement model is proposed in [10]A component is not replaced at a schedulled replacement time if its age is less thanT
This model has been modified in [11] to model that the operational cost of an unitis higher when it becomes older Moreover the model of [10] is extended in [5] toallow multi-component systems with any discrete lifetime distribution
223 Condition Based Maintenance
CBM is being introduced in many systems to avoid unnecessary maintenance andprevent incipient failure In wind turbines condition monitoring is being intro-duced for components like the gear box blades etc [32] One problem prior to theoptimization is to identify relevant variables and identify their relation with failuresmodes and probabilities CBM optimization models focus on different questionsrelated to inspectedmonitored components
One question is the optimal limits for the monitored variables above which it is nec-essary to perform maintenance The optimal wear-limit for preventive replacement
7
of a component is derived in [34] The model is extended in [35] to include differentmonitoring variables
For components subject to inspection at each decision epoch one must decide ifmaintenance should be performed and when the next inspection should occur In[2] the inspection occur at fixed time and the decision of preventive replacementof the component depend on its condition at inspection In [9] a Semi-MarkovDecision Process (SMDP see Chapter 4) is proposed to optimize at each inspectionthe maintenance decision and the time to next inspection
An age replacement policies model that takes into account the information fromcondition based monitoring devices is proposed in [25] A proportional hazardmodel is used to model the effect of the monitored variables The assumption ofa hazard model is that the hazard function is the product of a two functions onedepending on the time and one on the parameters (monitored variables)
224 Opportunistic Maintenance Models
Opportunistics maintenance considers unexpected opportunities of performing pre-ventive maintenance With the failure of a component it is possible to perform PMon other components This could be interesting for offshore wind farms for exampleThe deplacement to the wind farm by boat or helicopter is necessary and can bevery expensive By grouping maintenance actions money could be saved
Haurie et al [19] focus on group preventive replacement policy of m identicalcomponents that are in the same condition Both discrete and continuous time areconsidered and a dynamic programming equation is derived The model is extendedin [26] for m non-identical components
A rolling horizon dynamic programming algorithm is proposed in [45] to take intoaccount the short term information The model can be used for many maintenanceoptimization models
225 Other Types of Models and Criteria of Classifications
Other models integrate the possibility of a limited number of spare parts or a possi-ble choice between different spare part Eg cannibalization models allows the re-useof some components or subcomponents of a system
Other criterias can be used to classify maintenance optimization models The num-ber of components in consideration is important eg multi-components modelsare more interesting in power system The time horizon considered in the model
8
is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used
The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions
9
Chapter 3
Introduction to the Power
System
This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed
31 Power System Presentation
Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents
311 Power System Description
A simple description of the power system include the following main parts
1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated
2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units
11
3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)
4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage
The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition
The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs
312 Maintenance in Power System
The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined
Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])
Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more
12
attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)
The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems
32 Costs
Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows
bull Manpower cost Cost for the maintenance team that performs maintenanceactions
bull Spare part cost The cost of a new component is an important part of themaintenance cost
bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine
bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency
bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure
bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)
33 Main Constraints
Possibles constraints for the maintenance of power system have been identified asfollows
13
bull Manpower The size and availability of the maintenance staff is limited
bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available
bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms
bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time
bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model
bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model
bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model
14
Chapter 4
Introduction to Dynamic
Programming
This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels
41 Introduction
Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system
The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions
In the following important ideas concerning Dynamic Programming are discussed
411 Principle of Optimality
Dynamic programming is a way of decomposing a large problem into subproblems
It can be applied to any problem that observes the principle of optimality
15
An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]
The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions
Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed
412 Deterministic and Stochastic Models
A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made
If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic
Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting
413 Time Horizon
The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons
Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered
Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long
16
414 Decision Time
In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result
Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals
Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here
415 Exact and Approximation Methods
Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)
Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods
Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]
17
42 Deterministic Dynamic Programming
This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem
421 Problem Formulation
The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages
State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)
Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )
Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system
Jlowast0 (X0) = minUk
Nminus1sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
18
422 The Optimality Equation and Value Iteration Algorithm
The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)
Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i
The value iteration algorithm is a direct consequence of the optimality equation
JlowastN (i) = CN (i) foralli isin XN
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
Ulowastk (i) = argminuisinΩU
k(i)
Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
u Decision variableUlowastk (i) Optimal decision action at stage k for state i
lll
The algorithm goes backwards starting from the last stage It stops when k=0
19
423 A Simple Shortest Path Problem Example
Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space
An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered
B E H
A C F I K
D G J
Stage 0 Stage 1 Stage 2 Stage 3 Stage 4
2
4
3
4
62
1
35
2
2
57
3
21
2
4
2
7
The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost
Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K
4231 Problem Formulation
The problem is divided into five stagesn=5 k=01234
State SpaceThe state space is defined for each stage
ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2
ΩX3 = H I J = 0 1 2ΩX4 = K = 0
20
Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector
Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used
ΩUk (i) =
0 1 for i = 00 1 2 for i = 11 2 for i = 2
for k=123
ΩU0 (0) = 0 1 2 for k=0
For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F
Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G
A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro
lowast1 micro
lowastN
Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u
The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states
Objective Function
Jlowast0 (0) = minUkisinΩU
k(Xk)
4sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1
4232 Solution
The value iteration algorithm is used to solve the problem
The algorithm is initiated from the last stage and then iterated backwards until
21
the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited
The solution of the algorithm are given in Appendix A
The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)
22
Chapter 5
Finite Horizon Models
In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended
51 Problem Formulation
Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system
A stochastic dynamic programming model can be formulated as below
State Space
A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable
The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk
Decision Space
At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on
23
the stage u isin ΩUk (i)
Dynamic of the System and Transition Probability
On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)
Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1
The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage
Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)
If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified
P (j u i) = P (Xk+1 = j | Xk = i Uk = u)
In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)
Cost Function
A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage
Ck(j u i) = Ck(xk+1 = j uk = u xk = i)
If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)
A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate
Objective Function
The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem
Jlowast(X0) = minUkisinΩU
k(Xk)ECN (XN ) +
Nminus1sumk=0Ck(Xk+1 Uk Xk)
Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1
24
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
52 Optimality Equation
The optimality equation for stochastic finite horizon DP is
Jlowastk (i) = minuisinΩU
k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)
This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions
Jlowastk (i) = minuisinΩU
k(i)
sum
jisinΩXk+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)
ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function
53 Value Iteration Method
The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system
JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)
While k ge 0 doJlowastk (i) = min
uisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk
Ulowastk (i) = argminuisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN
k larr k minus 1
25
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
of a component is derived in [34] The model is extended in [35] to include differentmonitoring variables
For components subject to inspection at each decision epoch one must decide ifmaintenance should be performed and when the next inspection should occur In[2] the inspection occur at fixed time and the decision of preventive replacementof the component depend on its condition at inspection In [9] a Semi-MarkovDecision Process (SMDP see Chapter 4) is proposed to optimize at each inspectionthe maintenance decision and the time to next inspection
An age replacement policies model that takes into account the information fromcondition based monitoring devices is proposed in [25] A proportional hazardmodel is used to model the effect of the monitored variables The assumption ofa hazard model is that the hazard function is the product of a two functions onedepending on the time and one on the parameters (monitored variables)
224 Opportunistic Maintenance Models
Opportunistics maintenance considers unexpected opportunities of performing pre-ventive maintenance With the failure of a component it is possible to perform PMon other components This could be interesting for offshore wind farms for exampleThe deplacement to the wind farm by boat or helicopter is necessary and can bevery expensive By grouping maintenance actions money could be saved
Haurie et al [19] focus on group preventive replacement policy of m identicalcomponents that are in the same condition Both discrete and continuous time areconsidered and a dynamic programming equation is derived The model is extendedin [26] for m non-identical components
A rolling horizon dynamic programming algorithm is proposed in [45] to take intoaccount the short term information The model can be used for many maintenanceoptimization models
225 Other Types of Models and Criteria of Classifications
Other models integrate the possibility of a limited number of spare parts or a possi-ble choice between different spare part Eg cannibalization models allows the re-useof some components or subcomponents of a system
Other criterias can be used to classify maintenance optimization models The num-ber of components in consideration is important eg multi-components modelsare more interesting in power system The time horizon considered in the model
8
is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used
The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions
9
Chapter 3
Introduction to the Power
System
This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed
31 Power System Presentation
Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents
311 Power System Description
A simple description of the power system include the following main parts
1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated
2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units
11
3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)
4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage
The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition
The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs
312 Maintenance in Power System
The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined
Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])
Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more
12
attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)
The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems
32 Costs
Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows
bull Manpower cost Cost for the maintenance team that performs maintenanceactions
bull Spare part cost The cost of a new component is an important part of themaintenance cost
bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine
bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency
bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure
bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)
33 Main Constraints
Possibles constraints for the maintenance of power system have been identified asfollows
13
bull Manpower The size and availability of the maintenance staff is limited
bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available
bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms
bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time
bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model
bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model
bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model
14
Chapter 4
Introduction to Dynamic
Programming
This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels
41 Introduction
Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system
The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions
In the following important ideas concerning Dynamic Programming are discussed
411 Principle of Optimality
Dynamic programming is a way of decomposing a large problem into subproblems
It can be applied to any problem that observes the principle of optimality
15
An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]
The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions
Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed
412 Deterministic and Stochastic Models
A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made
If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic
Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting
413 Time Horizon
The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons
Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered
Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long
16
414 Decision Time
In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result
Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals
Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here
415 Exact and Approximation Methods
Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)
Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods
Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]
17
42 Deterministic Dynamic Programming
This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem
421 Problem Formulation
The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages
State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)
Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )
Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system
Jlowast0 (X0) = minUk
Nminus1sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
18
422 The Optimality Equation and Value Iteration Algorithm
The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)
Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i
The value iteration algorithm is a direct consequence of the optimality equation
JlowastN (i) = CN (i) foralli isin XN
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
Ulowastk (i) = argminuisinΩU
k(i)
Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
u Decision variableUlowastk (i) Optimal decision action at stage k for state i
lll
The algorithm goes backwards starting from the last stage It stops when k=0
19
423 A Simple Shortest Path Problem Example
Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space
An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered
B E H
A C F I K
D G J
Stage 0 Stage 1 Stage 2 Stage 3 Stage 4
2
4
3
4
62
1
35
2
2
57
3
21
2
4
2
7
The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost
Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K
4231 Problem Formulation
The problem is divided into five stagesn=5 k=01234
State SpaceThe state space is defined for each stage
ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2
ΩX3 = H I J = 0 1 2ΩX4 = K = 0
20
Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector
Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used
ΩUk (i) =
0 1 for i = 00 1 2 for i = 11 2 for i = 2
for k=123
ΩU0 (0) = 0 1 2 for k=0
For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F
Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G
A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro
lowast1 micro
lowastN
Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u
The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states
Objective Function
Jlowast0 (0) = minUkisinΩU
k(Xk)
4sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1
4232 Solution
The value iteration algorithm is used to solve the problem
The algorithm is initiated from the last stage and then iterated backwards until
21
the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited
The solution of the algorithm are given in Appendix A
The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)
22
Chapter 5
Finite Horizon Models
In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended
51 Problem Formulation
Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system
A stochastic dynamic programming model can be formulated as below
State Space
A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable
The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk
Decision Space
At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on
23
the stage u isin ΩUk (i)
Dynamic of the System and Transition Probability
On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)
Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1
The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage
Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)
If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified
P (j u i) = P (Xk+1 = j | Xk = i Uk = u)
In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)
Cost Function
A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage
Ck(j u i) = Ck(xk+1 = j uk = u xk = i)
If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)
A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate
Objective Function
The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem
Jlowast(X0) = minUkisinΩU
k(Xk)ECN (XN ) +
Nminus1sumk=0Ck(Xk+1 Uk Xk)
Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1
24
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
52 Optimality Equation
The optimality equation for stochastic finite horizon DP is
Jlowastk (i) = minuisinΩU
k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)
This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions
Jlowastk (i) = minuisinΩU
k(i)
sum
jisinΩXk+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)
ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function
53 Value Iteration Method
The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system
JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)
While k ge 0 doJlowastk (i) = min
uisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk
Ulowastk (i) = argminuisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN
k larr k minus 1
25
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used
The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions
9
Chapter 3
Introduction to the Power
System
This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed
31 Power System Presentation
Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents
311 Power System Description
A simple description of the power system include the following main parts
1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated
2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units
11
3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)
4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage
The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition
The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs
312 Maintenance in Power System
The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined
Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])
Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more
12
attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)
The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems
32 Costs
Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows
bull Manpower cost Cost for the maintenance team that performs maintenanceactions
bull Spare part cost The cost of a new component is an important part of themaintenance cost
bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine
bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency
bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure
bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)
33 Main Constraints
Possibles constraints for the maintenance of power system have been identified asfollows
13
bull Manpower The size and availability of the maintenance staff is limited
bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available
bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms
bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time
bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model
bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model
bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model
14
Chapter 4
Introduction to Dynamic
Programming
This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels
41 Introduction
Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system
The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions
In the following important ideas concerning Dynamic Programming are discussed
411 Principle of Optimality
Dynamic programming is a way of decomposing a large problem into subproblems
It can be applied to any problem that observes the principle of optimality
15
An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]
The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions
Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed
412 Deterministic and Stochastic Models
A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made
If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic
Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting
413 Time Horizon
The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons
Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered
Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long
16
414 Decision Time
In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result
Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals
Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here
415 Exact and Approximation Methods
Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)
Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods
Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]
17
42 Deterministic Dynamic Programming
This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem
421 Problem Formulation
The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages
State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)
Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )
Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system
Jlowast0 (X0) = minUk
Nminus1sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
18
422 The Optimality Equation and Value Iteration Algorithm
The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)
Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i
The value iteration algorithm is a direct consequence of the optimality equation
JlowastN (i) = CN (i) foralli isin XN
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
Ulowastk (i) = argminuisinΩU
k(i)
Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
u Decision variableUlowastk (i) Optimal decision action at stage k for state i
lll
The algorithm goes backwards starting from the last stage It stops when k=0
19
423 A Simple Shortest Path Problem Example
Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space
An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered
B E H
A C F I K
D G J
Stage 0 Stage 1 Stage 2 Stage 3 Stage 4
2
4
3
4
62
1
35
2
2
57
3
21
2
4
2
7
The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost
Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K
4231 Problem Formulation
The problem is divided into five stagesn=5 k=01234
State SpaceThe state space is defined for each stage
ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2
ΩX3 = H I J = 0 1 2ΩX4 = K = 0
20
Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector
Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used
ΩUk (i) =
0 1 for i = 00 1 2 for i = 11 2 for i = 2
for k=123
ΩU0 (0) = 0 1 2 for k=0
For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F
Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G
A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro
lowast1 micro
lowastN
Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u
The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states
Objective Function
Jlowast0 (0) = minUkisinΩU
k(Xk)
4sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1
4232 Solution
The value iteration algorithm is used to solve the problem
The algorithm is initiated from the last stage and then iterated backwards until
21
the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited
The solution of the algorithm are given in Appendix A
The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)
22
Chapter 5
Finite Horizon Models
In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended
51 Problem Formulation
Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system
A stochastic dynamic programming model can be formulated as below
State Space
A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable
The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk
Decision Space
At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on
23
the stage u isin ΩUk (i)
Dynamic of the System and Transition Probability
On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)
Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1
The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage
Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)
If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified
P (j u i) = P (Xk+1 = j | Xk = i Uk = u)
In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)
Cost Function
A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage
Ck(j u i) = Ck(xk+1 = j uk = u xk = i)
If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)
A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate
Objective Function
The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem
Jlowast(X0) = minUkisinΩU
k(Xk)ECN (XN ) +
Nminus1sumk=0Ck(Xk+1 Uk Xk)
Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1
24
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
52 Optimality Equation
The optimality equation for stochastic finite horizon DP is
Jlowastk (i) = minuisinΩU
k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)
This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions
Jlowastk (i) = minuisinΩU
k(i)
sum
jisinΩXk+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)
ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function
53 Value Iteration Method
The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system
JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)
While k ge 0 doJlowastk (i) = min
uisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk
Ulowastk (i) = argminuisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN
k larr k minus 1
25
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
Chapter 3
Introduction to the Power
System
This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed
31 Power System Presentation
Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents
311 Power System Description
A simple description of the power system include the following main parts
1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated
2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units
11
3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)
4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage
The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition
The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs
312 Maintenance in Power System
The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined
Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])
Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more
12
attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)
The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems
32 Costs
Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows
bull Manpower cost Cost for the maintenance team that performs maintenanceactions
bull Spare part cost The cost of a new component is an important part of themaintenance cost
bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine
bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency
bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure
bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)
33 Main Constraints
Possibles constraints for the maintenance of power system have been identified asfollows
13
bull Manpower The size and availability of the maintenance staff is limited
bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available
bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms
bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time
bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model
bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model
bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model
14
Chapter 4
Introduction to Dynamic
Programming
This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels
41 Introduction
Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system
The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions
In the following important ideas concerning Dynamic Programming are discussed
411 Principle of Optimality
Dynamic programming is a way of decomposing a large problem into subproblems
It can be applied to any problem that observes the principle of optimality
15
An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]
The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions
Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed
412 Deterministic and Stochastic Models
A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made
If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic
Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting
413 Time Horizon
The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons
Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered
Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long
16
414 Decision Time
In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result
Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals
Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here
415 Exact and Approximation Methods
Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)
Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods
Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]
17
42 Deterministic Dynamic Programming
This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem
421 Problem Formulation
The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages
State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)
Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )
Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system
Jlowast0 (X0) = minUk
Nminus1sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
18
422 The Optimality Equation and Value Iteration Algorithm
The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)
Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i
The value iteration algorithm is a direct consequence of the optimality equation
JlowastN (i) = CN (i) foralli isin XN
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
Ulowastk (i) = argminuisinΩU
k(i)
Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
u Decision variableUlowastk (i) Optimal decision action at stage k for state i
lll
The algorithm goes backwards starting from the last stage It stops when k=0
19
423 A Simple Shortest Path Problem Example
Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space
An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered
B E H
A C F I K
D G J
Stage 0 Stage 1 Stage 2 Stage 3 Stage 4
2
4
3
4
62
1
35
2
2
57
3
21
2
4
2
7
The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost
Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K
4231 Problem Formulation
The problem is divided into five stagesn=5 k=01234
State SpaceThe state space is defined for each stage
ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2
ΩX3 = H I J = 0 1 2ΩX4 = K = 0
20
Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector
Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used
ΩUk (i) =
0 1 for i = 00 1 2 for i = 11 2 for i = 2
for k=123
ΩU0 (0) = 0 1 2 for k=0
For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F
Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G
A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro
lowast1 micro
lowastN
Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u
The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states
Objective Function
Jlowast0 (0) = minUkisinΩU
k(Xk)
4sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1
4232 Solution
The value iteration algorithm is used to solve the problem
The algorithm is initiated from the last stage and then iterated backwards until
21
the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited
The solution of the algorithm are given in Appendix A
The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)
22
Chapter 5
Finite Horizon Models
In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended
51 Problem Formulation
Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system
A stochastic dynamic programming model can be formulated as below
State Space
A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable
The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk
Decision Space
At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on
23
the stage u isin ΩUk (i)
Dynamic of the System and Transition Probability
On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)
Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1
The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage
Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)
If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified
P (j u i) = P (Xk+1 = j | Xk = i Uk = u)
In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)
Cost Function
A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage
Ck(j u i) = Ck(xk+1 = j uk = u xk = i)
If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)
A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate
Objective Function
The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem
Jlowast(X0) = minUkisinΩU
k(Xk)ECN (XN ) +
Nminus1sumk=0Ck(Xk+1 Uk Xk)
Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1
24
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
52 Optimality Equation
The optimality equation for stochastic finite horizon DP is
Jlowastk (i) = minuisinΩU
k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)
This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions
Jlowastk (i) = minuisinΩU
k(i)
sum
jisinΩXk+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)
ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function
53 Value Iteration Method
The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system
JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)
While k ge 0 doJlowastk (i) = min
uisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk
Ulowastk (i) = argminuisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN
k larr k minus 1
25
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)
4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage
The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition
The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs
312 Maintenance in Power System
The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined
Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])
Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more
12
attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)
The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems
32 Costs
Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows
bull Manpower cost Cost for the maintenance team that performs maintenanceactions
bull Spare part cost The cost of a new component is an important part of themaintenance cost
bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine
bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency
bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure
bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)
33 Main Constraints
Possibles constraints for the maintenance of power system have been identified asfollows
13
bull Manpower The size and availability of the maintenance staff is limited
bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available
bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms
bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time
bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model
bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model
bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model
14
Chapter 4
Introduction to Dynamic
Programming
This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels
41 Introduction
Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system
The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions
In the following important ideas concerning Dynamic Programming are discussed
411 Principle of Optimality
Dynamic programming is a way of decomposing a large problem into subproblems
It can be applied to any problem that observes the principle of optimality
15
An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]
The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions
Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed
412 Deterministic and Stochastic Models
A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made
If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic
Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting
413 Time Horizon
The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons
Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered
Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long
16
414 Decision Time
In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result
Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals
Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here
415 Exact and Approximation Methods
Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)
Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods
Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]
17
42 Deterministic Dynamic Programming
This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem
421 Problem Formulation
The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages
State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)
Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )
Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system
Jlowast0 (X0) = minUk
Nminus1sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
18
422 The Optimality Equation and Value Iteration Algorithm
The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)
Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i
The value iteration algorithm is a direct consequence of the optimality equation
JlowastN (i) = CN (i) foralli isin XN
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
Ulowastk (i) = argminuisinΩU
k(i)
Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
u Decision variableUlowastk (i) Optimal decision action at stage k for state i
lll
The algorithm goes backwards starting from the last stage It stops when k=0
19
423 A Simple Shortest Path Problem Example
Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space
An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered
B E H
A C F I K
D G J
Stage 0 Stage 1 Stage 2 Stage 3 Stage 4
2
4
3
4
62
1
35
2
2
57
3
21
2
4
2
7
The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost
Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K
4231 Problem Formulation
The problem is divided into five stagesn=5 k=01234
State SpaceThe state space is defined for each stage
ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2
ΩX3 = H I J = 0 1 2ΩX4 = K = 0
20
Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector
Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used
ΩUk (i) =
0 1 for i = 00 1 2 for i = 11 2 for i = 2
for k=123
ΩU0 (0) = 0 1 2 for k=0
For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F
Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G
A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro
lowast1 micro
lowastN
Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u
The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states
Objective Function
Jlowast0 (0) = minUkisinΩU
k(Xk)
4sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1
4232 Solution
The value iteration algorithm is used to solve the problem
The algorithm is initiated from the last stage and then iterated backwards until
21
the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited
The solution of the algorithm are given in Appendix A
The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)
22
Chapter 5
Finite Horizon Models
In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended
51 Problem Formulation
Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system
A stochastic dynamic programming model can be formulated as below
State Space
A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable
The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk
Decision Space
At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on
23
the stage u isin ΩUk (i)
Dynamic of the System and Transition Probability
On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)
Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1
The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage
Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)
If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified
P (j u i) = P (Xk+1 = j | Xk = i Uk = u)
In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)
Cost Function
A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage
Ck(j u i) = Ck(xk+1 = j uk = u xk = i)
If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)
A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate
Objective Function
The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem
Jlowast(X0) = minUkisinΩU
k(Xk)ECN (XN ) +
Nminus1sumk=0Ck(Xk+1 Uk Xk)
Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1
24
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
52 Optimality Equation
The optimality equation for stochastic finite horizon DP is
Jlowastk (i) = minuisinΩU
k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)
This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions
Jlowastk (i) = minuisinΩU
k(i)
sum
jisinΩXk+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)
ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function
53 Value Iteration Method
The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system
JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)
While k ge 0 doJlowastk (i) = min
uisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk
Ulowastk (i) = argminuisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN
k larr k minus 1
25
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)
The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems
32 Costs
Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows
bull Manpower cost Cost for the maintenance team that performs maintenanceactions
bull Spare part cost The cost of a new component is an important part of themaintenance cost
bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine
bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency
bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure
bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)
33 Main Constraints
Possibles constraints for the maintenance of power system have been identified asfollows
13
bull Manpower The size and availability of the maintenance staff is limited
bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available
bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms
bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time
bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model
bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model
bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model
14
Chapter 4
Introduction to Dynamic
Programming
This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels
41 Introduction
Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system
The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions
In the following important ideas concerning Dynamic Programming are discussed
411 Principle of Optimality
Dynamic programming is a way of decomposing a large problem into subproblems
It can be applied to any problem that observes the principle of optimality
15
An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]
The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions
Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed
412 Deterministic and Stochastic Models
A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made
If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic
Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting
413 Time Horizon
The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons
Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered
Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long
16
414 Decision Time
In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result
Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals
Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here
415 Exact and Approximation Methods
Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)
Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods
Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]
17
42 Deterministic Dynamic Programming
This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem
421 Problem Formulation
The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages
State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)
Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )
Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system
Jlowast0 (X0) = minUk
Nminus1sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
18
422 The Optimality Equation and Value Iteration Algorithm
The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)
Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i
The value iteration algorithm is a direct consequence of the optimality equation
JlowastN (i) = CN (i) foralli isin XN
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
Ulowastk (i) = argminuisinΩU
k(i)
Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
u Decision variableUlowastk (i) Optimal decision action at stage k for state i
lll
The algorithm goes backwards starting from the last stage It stops when k=0
19
423 A Simple Shortest Path Problem Example
Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space
An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered
B E H
A C F I K
D G J
Stage 0 Stage 1 Stage 2 Stage 3 Stage 4
2
4
3
4
62
1
35
2
2
57
3
21
2
4
2
7
The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost
Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K
4231 Problem Formulation
The problem is divided into five stagesn=5 k=01234
State SpaceThe state space is defined for each stage
ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2
ΩX3 = H I J = 0 1 2ΩX4 = K = 0
20
Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector
Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used
ΩUk (i) =
0 1 for i = 00 1 2 for i = 11 2 for i = 2
for k=123
ΩU0 (0) = 0 1 2 for k=0
For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F
Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G
A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro
lowast1 micro
lowastN
Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u
The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states
Objective Function
Jlowast0 (0) = minUkisinΩU
k(Xk)
4sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1
4232 Solution
The value iteration algorithm is used to solve the problem
The algorithm is initiated from the last stage and then iterated backwards until
21
the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited
The solution of the algorithm are given in Appendix A
The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)
22
Chapter 5
Finite Horizon Models
In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended
51 Problem Formulation
Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system
A stochastic dynamic programming model can be formulated as below
State Space
A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable
The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk
Decision Space
At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on
23
the stage u isin ΩUk (i)
Dynamic of the System and Transition Probability
On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)
Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1
The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage
Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)
If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified
P (j u i) = P (Xk+1 = j | Xk = i Uk = u)
In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)
Cost Function
A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage
Ck(j u i) = Ck(xk+1 = j uk = u xk = i)
If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)
A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate
Objective Function
The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem
Jlowast(X0) = minUkisinΩU
k(Xk)ECN (XN ) +
Nminus1sumk=0Ck(Xk+1 Uk Xk)
Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1
24
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
52 Optimality Equation
The optimality equation for stochastic finite horizon DP is
Jlowastk (i) = minuisinΩU
k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)
This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions
Jlowastk (i) = minuisinΩU
k(i)
sum
jisinΩXk+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)
ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function
53 Value Iteration Method
The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system
JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)
While k ge 0 doJlowastk (i) = min
uisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk
Ulowastk (i) = argminuisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN
k larr k minus 1
25
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
bull Manpower The size and availability of the maintenance staff is limited
bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available
bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms
bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time
bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model
bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model
bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model
14
Chapter 4
Introduction to Dynamic
Programming
This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels
41 Introduction
Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system
The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions
In the following important ideas concerning Dynamic Programming are discussed
411 Principle of Optimality
Dynamic programming is a way of decomposing a large problem into subproblems
It can be applied to any problem that observes the principle of optimality
15
An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]
The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions
Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed
412 Deterministic and Stochastic Models
A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made
If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic
Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting
413 Time Horizon
The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons
Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered
Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long
16
414 Decision Time
In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result
Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals
Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here
415 Exact and Approximation Methods
Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)
Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods
Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]
17
42 Deterministic Dynamic Programming
This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem
421 Problem Formulation
The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages
State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)
Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )
Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system
Jlowast0 (X0) = minUk
Nminus1sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
18
422 The Optimality Equation and Value Iteration Algorithm
The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)
Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i
The value iteration algorithm is a direct consequence of the optimality equation
JlowastN (i) = CN (i) foralli isin XN
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
Ulowastk (i) = argminuisinΩU
k(i)
Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
u Decision variableUlowastk (i) Optimal decision action at stage k for state i
lll
The algorithm goes backwards starting from the last stage It stops when k=0
19
423 A Simple Shortest Path Problem Example
Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space
An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered
B E H
A C F I K
D G J
Stage 0 Stage 1 Stage 2 Stage 3 Stage 4
2
4
3
4
62
1
35
2
2
57
3
21
2
4
2
7
The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost
Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K
4231 Problem Formulation
The problem is divided into five stagesn=5 k=01234
State SpaceThe state space is defined for each stage
ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2
ΩX3 = H I J = 0 1 2ΩX4 = K = 0
20
Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector
Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used
ΩUk (i) =
0 1 for i = 00 1 2 for i = 11 2 for i = 2
for k=123
ΩU0 (0) = 0 1 2 for k=0
For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F
Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G
A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro
lowast1 micro
lowastN
Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u
The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states
Objective Function
Jlowast0 (0) = minUkisinΩU
k(Xk)
4sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1
4232 Solution
The value iteration algorithm is used to solve the problem
The algorithm is initiated from the last stage and then iterated backwards until
21
the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited
The solution of the algorithm are given in Appendix A
The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)
22
Chapter 5
Finite Horizon Models
In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended
51 Problem Formulation
Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system
A stochastic dynamic programming model can be formulated as below
State Space
A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable
The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk
Decision Space
At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on
23
the stage u isin ΩUk (i)
Dynamic of the System and Transition Probability
On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)
Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1
The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage
Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)
If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified
P (j u i) = P (Xk+1 = j | Xk = i Uk = u)
In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)
Cost Function
A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage
Ck(j u i) = Ck(xk+1 = j uk = u xk = i)
If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)
A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate
Objective Function
The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem
Jlowast(X0) = minUkisinΩU
k(Xk)ECN (XN ) +
Nminus1sumk=0Ck(Xk+1 Uk Xk)
Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1
24
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
52 Optimality Equation
The optimality equation for stochastic finite horizon DP is
Jlowastk (i) = minuisinΩU
k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)
This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions
Jlowastk (i) = minuisinΩU
k(i)
sum
jisinΩXk+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)
ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function
53 Value Iteration Method
The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system
JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)
While k ge 0 doJlowastk (i) = min
uisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk
Ulowastk (i) = argminuisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN
k larr k minus 1
25
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
Chapter 4
Introduction to Dynamic
Programming
This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels
41 Introduction
Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system
The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions
In the following important ideas concerning Dynamic Programming are discussed
411 Principle of Optimality
Dynamic programming is a way of decomposing a large problem into subproblems
It can be applied to any problem that observes the principle of optimality
15
An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]
The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions
Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed
412 Deterministic and Stochastic Models
A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made
If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic
Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting
413 Time Horizon
The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons
Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered
Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long
16
414 Decision Time
In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result
Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals
Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here
415 Exact and Approximation Methods
Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)
Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods
Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]
17
42 Deterministic Dynamic Programming
This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem
421 Problem Formulation
The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages
State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)
Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )
Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system
Jlowast0 (X0) = minUk
Nminus1sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
18
422 The Optimality Equation and Value Iteration Algorithm
The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)
Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i
The value iteration algorithm is a direct consequence of the optimality equation
JlowastN (i) = CN (i) foralli isin XN
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
Ulowastk (i) = argminuisinΩU
k(i)
Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
u Decision variableUlowastk (i) Optimal decision action at stage k for state i
lll
The algorithm goes backwards starting from the last stage It stops when k=0
19
423 A Simple Shortest Path Problem Example
Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space
An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered
B E H
A C F I K
D G J
Stage 0 Stage 1 Stage 2 Stage 3 Stage 4
2
4
3
4
62
1
35
2
2
57
3
21
2
4
2
7
The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost
Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K
4231 Problem Formulation
The problem is divided into five stagesn=5 k=01234
State SpaceThe state space is defined for each stage
ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2
ΩX3 = H I J = 0 1 2ΩX4 = K = 0
20
Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector
Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used
ΩUk (i) =
0 1 for i = 00 1 2 for i = 11 2 for i = 2
for k=123
ΩU0 (0) = 0 1 2 for k=0
For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F
Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G
A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro
lowast1 micro
lowastN
Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u
The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states
Objective Function
Jlowast0 (0) = minUkisinΩU
k(Xk)
4sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1
4232 Solution
The value iteration algorithm is used to solve the problem
The algorithm is initiated from the last stage and then iterated backwards until
21
the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited
The solution of the algorithm are given in Appendix A
The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)
22
Chapter 5
Finite Horizon Models
In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended
51 Problem Formulation
Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system
A stochastic dynamic programming model can be formulated as below
State Space
A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable
The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk
Decision Space
At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on
23
the stage u isin ΩUk (i)
Dynamic of the System and Transition Probability
On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)
Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1
The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage
Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)
If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified
P (j u i) = P (Xk+1 = j | Xk = i Uk = u)
In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)
Cost Function
A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage
Ck(j u i) = Ck(xk+1 = j uk = u xk = i)
If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)
A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate
Objective Function
The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem
Jlowast(X0) = minUkisinΩU
k(Xk)ECN (XN ) +
Nminus1sumk=0Ck(Xk+1 Uk Xk)
Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1
24
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
52 Optimality Equation
The optimality equation for stochastic finite horizon DP is
Jlowastk (i) = minuisinΩU
k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)
This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions
Jlowastk (i) = minuisinΩU
k(i)
sum
jisinΩXk+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)
ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function
53 Value Iteration Method
The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system
JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)
While k ge 0 doJlowastk (i) = min
uisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk
Ulowastk (i) = argminuisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN
k larr k minus 1
25
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]
The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions
Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed
412 Deterministic and Stochastic Models
A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made
If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic
Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting
413 Time Horizon
The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons
Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered
Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long
16
414 Decision Time
In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result
Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals
Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here
415 Exact and Approximation Methods
Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)
Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods
Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]
17
42 Deterministic Dynamic Programming
This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem
421 Problem Formulation
The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages
State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)
Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )
Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system
Jlowast0 (X0) = minUk
Nminus1sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
18
422 The Optimality Equation and Value Iteration Algorithm
The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)
Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i
The value iteration algorithm is a direct consequence of the optimality equation
JlowastN (i) = CN (i) foralli isin XN
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
Ulowastk (i) = argminuisinΩU
k(i)
Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
u Decision variableUlowastk (i) Optimal decision action at stage k for state i
lll
The algorithm goes backwards starting from the last stage It stops when k=0
19
423 A Simple Shortest Path Problem Example
Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space
An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered
B E H
A C F I K
D G J
Stage 0 Stage 1 Stage 2 Stage 3 Stage 4
2
4
3
4
62
1
35
2
2
57
3
21
2
4
2
7
The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost
Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K
4231 Problem Formulation
The problem is divided into five stagesn=5 k=01234
State SpaceThe state space is defined for each stage
ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2
ΩX3 = H I J = 0 1 2ΩX4 = K = 0
20
Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector
Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used
ΩUk (i) =
0 1 for i = 00 1 2 for i = 11 2 for i = 2
for k=123
ΩU0 (0) = 0 1 2 for k=0
For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F
Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G
A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro
lowast1 micro
lowastN
Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u
The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states
Objective Function
Jlowast0 (0) = minUkisinΩU
k(Xk)
4sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1
4232 Solution
The value iteration algorithm is used to solve the problem
The algorithm is initiated from the last stage and then iterated backwards until
21
the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited
The solution of the algorithm are given in Appendix A
The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)
22
Chapter 5
Finite Horizon Models
In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended
51 Problem Formulation
Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system
A stochastic dynamic programming model can be formulated as below
State Space
A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable
The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk
Decision Space
At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on
23
the stage u isin ΩUk (i)
Dynamic of the System and Transition Probability
On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)
Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1
The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage
Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)
If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified
P (j u i) = P (Xk+1 = j | Xk = i Uk = u)
In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)
Cost Function
A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage
Ck(j u i) = Ck(xk+1 = j uk = u xk = i)
If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)
A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate
Objective Function
The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem
Jlowast(X0) = minUkisinΩU
k(Xk)ECN (XN ) +
Nminus1sumk=0Ck(Xk+1 Uk Xk)
Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1
24
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
52 Optimality Equation
The optimality equation for stochastic finite horizon DP is
Jlowastk (i) = minuisinΩU
k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)
This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions
Jlowastk (i) = minuisinΩU
k(i)
sum
jisinΩXk+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)
ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function
53 Value Iteration Method
The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system
JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)
While k ge 0 doJlowastk (i) = min
uisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk
Ulowastk (i) = argminuisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN
k larr k minus 1
25
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
414 Decision Time
In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result
Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals
Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here
415 Exact and Approximation Methods
Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)
Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods
Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]
17
42 Deterministic Dynamic Programming
This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem
421 Problem Formulation
The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages
State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)
Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )
Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system
Jlowast0 (X0) = minUk
Nminus1sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
18
422 The Optimality Equation and Value Iteration Algorithm
The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)
Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i
The value iteration algorithm is a direct consequence of the optimality equation
JlowastN (i) = CN (i) foralli isin XN
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
Ulowastk (i) = argminuisinΩU
k(i)
Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
u Decision variableUlowastk (i) Optimal decision action at stage k for state i
lll
The algorithm goes backwards starting from the last stage It stops when k=0
19
423 A Simple Shortest Path Problem Example
Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space
An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered
B E H
A C F I K
D G J
Stage 0 Stage 1 Stage 2 Stage 3 Stage 4
2
4
3
4
62
1
35
2
2
57
3
21
2
4
2
7
The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost
Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K
4231 Problem Formulation
The problem is divided into five stagesn=5 k=01234
State SpaceThe state space is defined for each stage
ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2
ΩX3 = H I J = 0 1 2ΩX4 = K = 0
20
Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector
Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used
ΩUk (i) =
0 1 for i = 00 1 2 for i = 11 2 for i = 2
for k=123
ΩU0 (0) = 0 1 2 for k=0
For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F
Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G
A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro
lowast1 micro
lowastN
Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u
The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states
Objective Function
Jlowast0 (0) = minUkisinΩU
k(Xk)
4sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1
4232 Solution
The value iteration algorithm is used to solve the problem
The algorithm is initiated from the last stage and then iterated backwards until
21
the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited
The solution of the algorithm are given in Appendix A
The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)
22
Chapter 5
Finite Horizon Models
In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended
51 Problem Formulation
Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system
A stochastic dynamic programming model can be formulated as below
State Space
A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable
The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk
Decision Space
At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on
23
the stage u isin ΩUk (i)
Dynamic of the System and Transition Probability
On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)
Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1
The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage
Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)
If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified
P (j u i) = P (Xk+1 = j | Xk = i Uk = u)
In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)
Cost Function
A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage
Ck(j u i) = Ck(xk+1 = j uk = u xk = i)
If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)
A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate
Objective Function
The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem
Jlowast(X0) = minUkisinΩU
k(Xk)ECN (XN ) +
Nminus1sumk=0Ck(Xk+1 Uk Xk)
Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1
24
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
52 Optimality Equation
The optimality equation for stochastic finite horizon DP is
Jlowastk (i) = minuisinΩU
k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)
This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions
Jlowastk (i) = minuisinΩU
k(i)
sum
jisinΩXk+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)
ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function
53 Value Iteration Method
The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system
JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)
While k ge 0 doJlowastk (i) = min
uisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk
Ulowastk (i) = argminuisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN
k larr k minus 1
25
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
42 Deterministic Dynamic Programming
This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem
421 Problem Formulation
The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages
State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)
Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )
Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system
Jlowast0 (X0) = minUk
Nminus1sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
18
422 The Optimality Equation and Value Iteration Algorithm
The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)
Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i
The value iteration algorithm is a direct consequence of the optimality equation
JlowastN (i) = CN (i) foralli isin XN
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
Ulowastk (i) = argminuisinΩU
k(i)
Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
u Decision variableUlowastk (i) Optimal decision action at stage k for state i
lll
The algorithm goes backwards starting from the last stage It stops when k=0
19
423 A Simple Shortest Path Problem Example
Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space
An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered
B E H
A C F I K
D G J
Stage 0 Stage 1 Stage 2 Stage 3 Stage 4
2
4
3
4
62
1
35
2
2
57
3
21
2
4
2
7
The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost
Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K
4231 Problem Formulation
The problem is divided into five stagesn=5 k=01234
State SpaceThe state space is defined for each stage
ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2
ΩX3 = H I J = 0 1 2ΩX4 = K = 0
20
Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector
Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used
ΩUk (i) =
0 1 for i = 00 1 2 for i = 11 2 for i = 2
for k=123
ΩU0 (0) = 0 1 2 for k=0
For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F
Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G
A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro
lowast1 micro
lowastN
Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u
The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states
Objective Function
Jlowast0 (0) = minUkisinΩU
k(Xk)
4sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1
4232 Solution
The value iteration algorithm is used to solve the problem
The algorithm is initiated from the last stage and then iterated backwards until
21
the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited
The solution of the algorithm are given in Appendix A
The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)
22
Chapter 5
Finite Horizon Models
In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended
51 Problem Formulation
Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system
A stochastic dynamic programming model can be formulated as below
State Space
A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable
The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk
Decision Space
At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on
23
the stage u isin ΩUk (i)
Dynamic of the System and Transition Probability
On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)
Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1
The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage
Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)
If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified
P (j u i) = P (Xk+1 = j | Xk = i Uk = u)
In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)
Cost Function
A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage
Ck(j u i) = Ck(xk+1 = j uk = u xk = i)
If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)
A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate
Objective Function
The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem
Jlowast(X0) = minUkisinΩU
k(Xk)ECN (XN ) +
Nminus1sumk=0Ck(Xk+1 Uk Xk)
Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1
24
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
52 Optimality Equation
The optimality equation for stochastic finite horizon DP is
Jlowastk (i) = minuisinΩU
k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)
This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions
Jlowastk (i) = minuisinΩU
k(i)
sum
jisinΩXk+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)
ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function
53 Value Iteration Method
The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system
JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)
While k ge 0 doJlowastk (i) = min
uisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk
Ulowastk (i) = argminuisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN
k larr k minus 1
25
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
422 The Optimality Equation and Value Iteration Algorithm
The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)
Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i
The value iteration algorithm is a direct consequence of the optimality equation
JlowastN (i) = CN (i) foralli isin XN
Jlowastk (i) = minuisinΩU
k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
Ulowastk (i) = argminuisinΩU
k(i)
Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk
u Decision variableUlowastk (i) Optimal decision action at stage k for state i
lll
The algorithm goes backwards starting from the last stage It stops when k=0
19
423 A Simple Shortest Path Problem Example
Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space
An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered
B E H
A C F I K
D G J
Stage 0 Stage 1 Stage 2 Stage 3 Stage 4
2
4
3
4
62
1
35
2
2
57
3
21
2
4
2
7
The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost
Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K
4231 Problem Formulation
The problem is divided into five stagesn=5 k=01234
State SpaceThe state space is defined for each stage
ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2
ΩX3 = H I J = 0 1 2ΩX4 = K = 0
20
Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector
Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used
ΩUk (i) =
0 1 for i = 00 1 2 for i = 11 2 for i = 2
for k=123
ΩU0 (0) = 0 1 2 for k=0
For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F
Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G
A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro
lowast1 micro
lowastN
Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u
The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states
Objective Function
Jlowast0 (0) = minUkisinΩU
k(Xk)
4sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1
4232 Solution
The value iteration algorithm is used to solve the problem
The algorithm is initiated from the last stage and then iterated backwards until
21
the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited
The solution of the algorithm are given in Appendix A
The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)
22
Chapter 5
Finite Horizon Models
In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended
51 Problem Formulation
Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system
A stochastic dynamic programming model can be formulated as below
State Space
A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable
The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk
Decision Space
At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on
23
the stage u isin ΩUk (i)
Dynamic of the System and Transition Probability
On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)
Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1
The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage
Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)
If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified
P (j u i) = P (Xk+1 = j | Xk = i Uk = u)
In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)
Cost Function
A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage
Ck(j u i) = Ck(xk+1 = j uk = u xk = i)
If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)
A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate
Objective Function
The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem
Jlowast(X0) = minUkisinΩU
k(Xk)ECN (XN ) +
Nminus1sumk=0Ck(Xk+1 Uk Xk)
Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1
24
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
52 Optimality Equation
The optimality equation for stochastic finite horizon DP is
Jlowastk (i) = minuisinΩU
k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)
This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions
Jlowastk (i) = minuisinΩU
k(i)
sum
jisinΩXk+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)
ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function
53 Value Iteration Method
The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system
JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)
While k ge 0 doJlowastk (i) = min
uisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk
Ulowastk (i) = argminuisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN
k larr k minus 1
25
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
423 A Simple Shortest Path Problem Example
Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space
An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered
B E H
A C F I K
D G J
Stage 0 Stage 1 Stage 2 Stage 3 Stage 4
2
4
3
4
62
1
35
2
2
57
3
21
2
4
2
7
The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost
Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K
4231 Problem Formulation
The problem is divided into five stagesn=5 k=01234
State SpaceThe state space is defined for each stage
ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2
ΩX3 = H I J = 0 1 2ΩX4 = K = 0
20
Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector
Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used
ΩUk (i) =
0 1 for i = 00 1 2 for i = 11 2 for i = 2
for k=123
ΩU0 (0) = 0 1 2 for k=0
For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F
Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G
A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro
lowast1 micro
lowastN
Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u
The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states
Objective Function
Jlowast0 (0) = minUkisinΩU
k(Xk)
4sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1
4232 Solution
The value iteration algorithm is used to solve the problem
The algorithm is initiated from the last stage and then iterated backwards until
21
the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited
The solution of the algorithm are given in Appendix A
The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)
22
Chapter 5
Finite Horizon Models
In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended
51 Problem Formulation
Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system
A stochastic dynamic programming model can be formulated as below
State Space
A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable
The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk
Decision Space
At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on
23
the stage u isin ΩUk (i)
Dynamic of the System and Transition Probability
On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)
Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1
The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage
Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)
If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified
P (j u i) = P (Xk+1 = j | Xk = i Uk = u)
In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)
Cost Function
A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage
Ck(j u i) = Ck(xk+1 = j uk = u xk = i)
If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)
A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate
Objective Function
The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem
Jlowast(X0) = minUkisinΩU
k(Xk)ECN (XN ) +
Nminus1sumk=0Ck(Xk+1 Uk Xk)
Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1
24
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
52 Optimality Equation
The optimality equation for stochastic finite horizon DP is
Jlowastk (i) = minuisinΩU
k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)
This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions
Jlowastk (i) = minuisinΩU
k(i)
sum
jisinΩXk+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)
ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function
53 Value Iteration Method
The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system
JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)
While k ge 0 doJlowastk (i) = min
uisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk
Ulowastk (i) = argminuisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN
k larr k minus 1
25
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector
Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used
ΩUk (i) =
0 1 for i = 00 1 2 for i = 11 2 for i = 2
for k=123
ΩU0 (0) = 0 1 2 for k=0
For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F
Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G
A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro
lowast1 micro
lowastN
Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u
The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states
Objective Function
Jlowast0 (0) = minUkisinΩU
k(Xk)
4sumk=0Ck(Xk Uk) + CN (XN )
Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1
4232 Solution
The value iteration algorithm is used to solve the problem
The algorithm is initiated from the last stage and then iterated backwards until
21
the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited
The solution of the algorithm are given in Appendix A
The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)
22
Chapter 5
Finite Horizon Models
In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended
51 Problem Formulation
Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system
A stochastic dynamic programming model can be formulated as below
State Space
A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable
The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk
Decision Space
At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on
23
the stage u isin ΩUk (i)
Dynamic of the System and Transition Probability
On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)
Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1
The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage
Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)
If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified
P (j u i) = P (Xk+1 = j | Xk = i Uk = u)
In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)
Cost Function
A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage
Ck(j u i) = Ck(xk+1 = j uk = u xk = i)
If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)
A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate
Objective Function
The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem
Jlowast(X0) = minUkisinΩU
k(Xk)ECN (XN ) +
Nminus1sumk=0Ck(Xk+1 Uk Xk)
Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1
24
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
52 Optimality Equation
The optimality equation for stochastic finite horizon DP is
Jlowastk (i) = minuisinΩU
k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)
This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions
Jlowastk (i) = minuisinΩU
k(i)
sum
jisinΩXk+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)
ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function
53 Value Iteration Method
The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system
JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)
While k ge 0 doJlowastk (i) = min
uisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk
Ulowastk (i) = argminuisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN
k larr k minus 1
25
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited
The solution of the algorithm are given in Appendix A
The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)
22
Chapter 5
Finite Horizon Models
In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended
51 Problem Formulation
Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system
A stochastic dynamic programming model can be formulated as below
State Space
A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable
The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk
Decision Space
At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on
23
the stage u isin ΩUk (i)
Dynamic of the System and Transition Probability
On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)
Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1
The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage
Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)
If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified
P (j u i) = P (Xk+1 = j | Xk = i Uk = u)
In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)
Cost Function
A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage
Ck(j u i) = Ck(xk+1 = j uk = u xk = i)
If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)
A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate
Objective Function
The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem
Jlowast(X0) = minUkisinΩU
k(Xk)ECN (XN ) +
Nminus1sumk=0Ck(Xk+1 Uk Xk)
Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1
24
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
52 Optimality Equation
The optimality equation for stochastic finite horizon DP is
Jlowastk (i) = minuisinΩU
k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)
This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions
Jlowastk (i) = minuisinΩU
k(i)
sum
jisinΩXk+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)
ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function
53 Value Iteration Method
The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system
JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)
While k ge 0 doJlowastk (i) = min
uisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk
Ulowastk (i) = argminuisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN
k larr k minus 1
25
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
Chapter 5
Finite Horizon Models
In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended
51 Problem Formulation
Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system
A stochastic dynamic programming model can be formulated as below
State Space
A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable
The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk
Decision Space
At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on
23
the stage u isin ΩUk (i)
Dynamic of the System and Transition Probability
On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)
Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1
The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage
Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)
If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified
P (j u i) = P (Xk+1 = j | Xk = i Uk = u)
In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)
Cost Function
A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage
Ck(j u i) = Ck(xk+1 = j uk = u xk = i)
If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)
A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate
Objective Function
The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem
Jlowast(X0) = minUkisinΩU
k(Xk)ECN (XN ) +
Nminus1sumk=0Ck(Xk+1 Uk Xk)
Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1
24
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
52 Optimality Equation
The optimality equation for stochastic finite horizon DP is
Jlowastk (i) = minuisinΩU
k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)
This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions
Jlowastk (i) = minuisinΩU
k(i)
sum
jisinΩXk+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)
ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function
53 Value Iteration Method
The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system
JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)
While k ge 0 doJlowastk (i) = min
uisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk
Ulowastk (i) = argminuisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN
k larr k minus 1
25
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
the stage u isin ΩUk (i)
Dynamic of the System and Transition Probability
On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)
Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1
The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage
Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)
If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified
P (j u i) = P (Xk+1 = j | Xk = i Uk = u)
In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)
Cost Function
A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage
Ck(j u i) = Ck(xk+1 = j uk = u xk = i)
If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)
A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate
Objective Function
The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem
Jlowast(X0) = minUkisinΩU
k(Xk)ECN (XN ) +
Nminus1sumk=0Ck(Xk+1 Uk Xk)
Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1
24
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
52 Optimality Equation
The optimality equation for stochastic finite horizon DP is
Jlowastk (i) = minuisinΩU
k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)
This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions
Jlowastk (i) = minuisinΩU
k(i)
sum
jisinΩXk+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)
ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function
53 Value Iteration Method
The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system
JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)
While k ge 0 doJlowastk (i) = min
uisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk
Ulowastk (i) = argminuisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN
k larr k minus 1
25
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i
52 Optimality Equation
The optimality equation for stochastic finite horizon DP is
Jlowastk (i) = minuisinΩU
k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)
This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions
Jlowastk (i) = minuisinΩU
k(i)
sum
jisinΩXk+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)
ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function
53 Value Iteration Method
The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system
JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)
While k ge 0 doJlowastk (i) = min
uisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk
Ulowastk (i) = argminuisinUk(i)
sumjisinΩX
k+1
Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN
k larr k minus 1
25
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
u Decision variable U lowastk (i) Optimal decision action at stage k for state i
The recursion finishes when the first stage is reached
54 The Curse of Dimensionality
Consider a finite horizon stochastic dynamic problem with
bull N stages
bull NX states variables the size of the set for each state variable is S
bull NU control variables the size of the set for each control variable is A
The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality
55 Ideas for a Maintenance Optimization Model
In this section possible state variables for a maintenance models based on SDP arediscussed
551 Age and Deterioration States
The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary
Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace
26
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
552 Forecasts
Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario
Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant
Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms
553 Time Lags
An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)
This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh
For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process
27
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
Chapter 6
Infinite Horizon Models -
Markov Decision Processes
Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended
In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation
The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter
61 Problem Formulation
The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function
An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For
29
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)
The objective is to find the optimal microlowast It should minimize the cost-to-go function
To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems
Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1
micro Decision policyJlowast(i) Optimal cost-to-go function for state i
Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)
As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)
Jlowast(X0) = minmicroE limNrarrinfin
Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
α Discount factor
Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted
To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize
Jlowast = minmicroE limNrarrinfin
Nminus1sumk=0
1Nmiddot C(Xk+1 micro(Xk) Xk)
Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1
30
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
62 Optimality Equations
The optimality equations are formulated using the probability function P (i u j)
The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX
Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i
For a IHSDP discounted problem the optimality equation is
Jmicro(i) = minmicro(i)isinΩU (i)
sum
jisinΩX
Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX
The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67
63 Value Iteration
To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5
Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1
1minusα
For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm
An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration
64 The Policy Iteration Algorithm
Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the
31
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement
The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps
Step 1 Policy Evaluation
microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated
Jmicroq(i) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
q Iteration number for the policy iteration algorithm
This is the expected cost-to-go function of the system using the policy microq
Step 2 Policy Improvement
A new policy is obtained using the value iteration algorithm
microq+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + Jmicroq(j)]
Go back to policy evaluation step
The process stops when microq+1 = microq
At each iteration the algorithm always improve the policy If the initial policy micro0
is already good then the algorithm will converge fast to the optimal solution
65 Modified Policy Iteration
If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive
An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm
is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)
32
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
While m ge 0 do
Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1
microk (j)] foralli isin ΩX
mlarr mminus 1
m Number of iteration left for the evaluation step of modified policy iteration
The algorithm stops when m=0 and Jmicrok is approximated by J0microk
66 Average Cost-to-go Problems
The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])
Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that
hmicro(X) = 0
λmicro + hmicro(i) =sum
jisinΩX
P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX
This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state
The optimal average cost and optimal policy satisfy the Bellman equation
λlowast + hlowast(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX
microlowast(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX
661 Relative Value Iteration
The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen
33
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
arbitrarly
Hk = minuisinΩU (X)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(X)]
hk+1(i) = minuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX
microk+1(i) = argminuisinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX
The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory
662 Policy Iteration
The problem can also be solved using the policy iteration algorithm
Initialisation X can be chosen arbitrarly
Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm
Else solve the system of equation
hq(X) = 0λq + hq(i) =
sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX
Step 2 Policy improvement
microq+1 = argminuisinΩU (i)
sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX
q = q + 1
67 Linear Programming
The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods
34
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
For example in the discounted IHSDP
Jmicro(i) = argminmicro(i)isinΩU (i)
sum
jisinΩX
P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX
Jmicro(i) is solution of the following linear programming model
MinimizesumiisinΩXJmicro(i)
Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le
sumjisinΩX P (j u i) middot C(j u i)forallu i
At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]
68 Efficiency of the Algorithms
For details about the complexity of the algorithms [28] and [29] are recommended
If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]
Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]
69 Semi-Markov Decision Process
Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)
SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in
35
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]
The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)
SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter
SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable
36
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
Chapter 7
Approximate Methods for
Markov Decision Process -
Reinforcement Learning
Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space
The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]
71 Introduction
The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques
Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics
One of the first reinforcement learning approaches was using artificial neural net-
37
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])
Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]
The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)
The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations
In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly
The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74
72 Direct Learning
The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists
38
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
721 Policy Evaluation using Temporal Differences
Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration
The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function
TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems
Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed
The cost-to-go resulting from the trajectory starting from the state Xk is
V (Xk) =Nsum
n=k
C(Xn Xn+1)
V (Xk) Cost-to-go of a trajectory starting from state Xk
If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by
J(i) =1
K
Ksum
m=1
V (im)
V (im) Cost-to-go of a trajectory starting from state i after the mth visit
A recursive form of the method can be formulated
J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory
From a trajectory point of view
J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]
γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories
39
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)
At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced
J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l
Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is
J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]
Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by
Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known
The improved policy
microk+1(i) = argminuisinΩU (i)
Qmicrok(i u)
It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples
722 Q-learning
Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method
The optimal Q-factor are defined by
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + Jlowast(j)] (71)
40
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
The optimality equation can be rewritten in term of Q-factors
Jlowast(i) = minuisinU(Xk+1)
Qlowast(i u) (72)
By combining the 2 equations we obtain
Qlowast(i u) =sum
jisinΩX
P (j u i) middot [C(j u i) + minvisinU(j)
Qlowast(j v)] (73)
Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)
Q(i u) can be initialized arbitrarly
For each sample (Xk Xk+1 Uk Ck) do
Uk = argminuisinU(Xk)
Q(Xk u))
Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)
Q(Xk+1 u)]l
with γ defined as for TD
The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic
In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined
73 Indirect Learning
On-line application can take advantage of the experience gained from real time useby
-Using the direct learning approach presented in the precedent section for eachsample of experience
-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning
41
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
74 Supervised Learning
With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space
As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored
Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)
There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example
A general approach to a supervised learning problem can be
bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method
bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem
bull Decide of a training algorithm
bull Gathering a training set
bull Train the function with the training set The function can then be validatedusing a subset of the training set
bull Evaluate the performance of the approximated function using a test set
An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function
42
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
Chapter 8
Review of Models for
Maintenance Optimization
This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed
81 Finite Horizon Dynamic Programming
811 Deterministic Models
Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming
812 Stochastic Models
In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates
43
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function
One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time
The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length
82 Infinite Horizon Stochastic Models
821 Discrete Time infinite Horizon Models
In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair
The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced
An average cost-to-cost approach is used to evaluate the policy
First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated
The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state
Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic
The model is solved using the linear programming method
44
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
822 Semi-Markov Decision Process
Many condition-based maintenance models based on SMDP have been proposedthese last years
Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered
The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker
83 Reinforcement Learning
Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation
84 Conclusions
An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history
The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence
45
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable
Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods
Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure
Table 81 shows a summary of the models and most important methods
Table 81 Summary of models and methods
Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization
Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP
Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization
Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization
Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI
Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance
-Complex (Average cost-to-go approach)
46
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
Chapter 9
A Proposed Finite Horizon
Replacement Model
A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working
The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm
91 One-Component Model
911 Idea of the Model
In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle
The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices
If a high electricity price is expected in a close future it could be interesting to
47
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year
There can be transitions from one scenario to another depending on the period ofthe year
In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate
912 Notations for the Proposed Model
Numbers
NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component
Costs
CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i
Variables
i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage
State and Control Space
48
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
x1k Component state at stage kx2k Electricity state at stage k
Probability function
λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi
Sets
Ωx1
Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i
States notations
W Working statePM Preventive maintenance stateCM Corrective maintenance state
913 Assumptions
bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1
bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)
bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage
bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage
bull If the system is not working a cost for interruption CI per stage is considered
bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)
bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown
49
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition
bull The manpower is assumed unlimited Spare parts are not considered
914 Model Description
9141 State Space
The state vector Xk is composed of two states variables x1k for the state of the
component (its age) and x2k for the electricity scenario NX = 2
The state of the system is thus represented by a vector as in (91)
Xk =
(x1k
x2k
)x1k isin Ωx1 x2
k isin Ωx2 (91)
Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios
Component state
The status of the component (its age) at each stage is represented by one statevariable x1
k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM
and NPM
To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases
50
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
CM2 CM1
W0 W1 W2 W3 W4
PM1
(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))
Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)
(1minus Tsλ(4))
1
1
1
1 1 1 1 1
Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1
Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1
k isin Ωx1
= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state
More generally
Ωx1
= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1
51
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
Electricity scenario state
Electricity scenarios are associated with one state variable x2k There areNE possible
states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx
2
= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios
The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher
13
13
13
Stage
Electricity Prices SEKMWh
Scenario 1
Scenario 2
Scenario 3
k-1 k k+1
200
250
300
350
400
450
500
Figure 92 Example of electricity scenarios NE = 3
52
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
9142 Decision Space
At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system
Uk = 0 no preventive maintenance
Uk = 1 preventive maintenance
The decision space depends only on the component state i1
ΩU (i) =
0 1 if i1 isin W1 WNW
empty else
9143 Transition Probabilities
The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently
P (Xk+1 = j | Uk = uXk = i)
= P (x1k+1 = j1 x2
k+1 = j2 | uk = u x1k = i1 x2 = i2)
= P (x1k+1 = j1 | uk = u x1
k = i1) middot P (x2k+1 = j2 | x2
k = i2)
= P (j1 u i1) middot Pk(j2 i2)
Component state transition probability
At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)
The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91
Table 91 summarizes the transition porbabilities that not equal to zero
Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0
Electricity State
The transition probabilities of the electricity state Pk(j2 i2) are not stationary
They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j
2 i2) can take three different values defined by the transition matrices P 1E P 2
E
or P 3E i2 is represented by the rows of the matrices and j2 by the column
53
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
Table 91 Transition probabilities
i1 u j1 P (j1 u i1)
Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1
PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1
CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1
Table 92 Example of transition matrix for electricity scenarios
P 1E =
1 0 00 1 00 0 1
P 2
E =
13 13 1313 13 1313 13 13
P 3
E =
06 02 0202 06 0202 02 06
Table 93 Example of transition probabilities on a 12 stages horizon
Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11
Pk(j2 i2) P 1
E P 1E P 1
E P 3E P 3
E P 2E P 2
E P 2E P 3
E P 1E P 1
E P 1E
9144 Cost Function
The costs associated to the possible transitions can be of different kinds
bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)
bull Cost for maintenance CCM or CPM
bull Cost for interruption CI
Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable
A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component
54
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
Table 94 Transition costs
i1 u j1 Ck(j u i)
Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)
Wq q isin 0 NW minus 1 0 CM1 CI + CCM
WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM
Wq 1 PM1 CI + CPM
PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM
PMNPMminus1 empty W0 CI + CPM
CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM
CMNCMminus1 empty W0 CI + CCM
92 Multi-Component model
In this section the model presented in Section 91 is extended to multi-componentssystems
921 Idea of the Model
The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon
This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time
922 Notations for the Proposed Model
Numbers
NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c
55
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
Costs
CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i
Variables
ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c
State and Control Space
xck c isin 1 NC State of the component c at stage kxc A component state
xNC+1k Electricity state at stage kuck Maintenance for component c at stage k
Probability functions
λc(i) Failure probability function for component c
Sets
Ωxc
State space for component c
ΩxNC+1
Electricity state spaceΩuc
(ic) Decision space for component c in state ic
923 Assumptions
bull The system is composed of NC components in series If one component failsthe whole system fails
bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC
bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage
bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage
56
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
bull An interruption cost CI is consider whatever the maintenance is done on thesystem
bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)
bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c
924 Model Description
9241 State Space
The state of the system can be represented by a vector as in (92)
Xk =
x1k
xNckxNc+1k
(92)
xck c isin 1 NC represent the state of component c
xNc+1k represents the electricity state
Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component
The state space related to the component c is noted Ωxc
xck isin Ωxc
= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1
Electricity SpaceSame as in Section 81
9242 Decision Space
At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system
57
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
uck = 0 no preventive maintenance on component n
uck = 1 preventive maintenance on component n
The decision variables constitute a decision vector
Uk =
u1k
u2k
uNck
(93)
The decision space for each decision variable can be defined by
forallc isin 1 Nc Ωuc
(ic) =
0 1 if ic isin W0 WNWc
empty else
9243 Transition Probability
The state variables xc are independent of the electricity state xNc+1 Consequently
P (Xk+1 = j | Uk = UXk = i) (94)
= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)
The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81
Component states transitions
The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered
Case 1
If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently
If forallc isin 1 NC yck isin W1 WNWn
P ((j1 jNC ) 0 (i1 iNC )) =NCprod
c=1
P (ic 0 jc)
Case 2
58
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
If one of the component is in maintenance or the decision of preventive maintenanceis
P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod
n=1
P c
with P c =
P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc
1 if ic 6isin W0 WNWc minus1 and ic = jc
0 else
9244 Cost Function
As for the transition probabilities there are 2 cases
Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained
If forallc isin 1 NC yck isin W1 WNWn
C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)
Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions
C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum
c=1
Cc
with Cc =
CCMc if ic isin CM1 CMNCMc or jc = CM1
CPMc if ic isin PM1 PMNPMc or jn = PM1
0 else
93 Possible Extensions
The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model
bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable
59
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model
bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states
bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model
bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states
60
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
Chapter 10
Conclusions and Future Work
This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems
The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space
A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed
A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application
61
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored
In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time
An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model
62
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
Appendix A
Solution of the Shortest Path
Example
Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J
lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0
Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J
lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2
Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J
lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1
Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J
lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J
lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2
Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J
lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2
Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J
lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2
63
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
Reference List
[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001
[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995
[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006
[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis
[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996
[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994
[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965
[8] R Bellman Dynamic Programming Princeton University Press Princeton1957
[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997
[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976
[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979
65
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005
[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996
[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006
[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991
[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997
[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966
[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004
[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982
[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004
[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004
[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004
[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996
[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999
66
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997
[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983
[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006
[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996
[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999
[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999
[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006
[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007
[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006
[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988
[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993
[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994
[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006
67
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68
[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006
[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007
[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004
[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998
[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006
[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002
[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006
[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research
[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995
[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005
68